Datasets on Crowdsourcing
	Dataset Name (with link)	Size: #questions, #answers, ratio (#answers/#questions)	#Questions with Ground Truth	Application	Quuestion Content: Image or Text	Question Type: Single Choice (N choose 1), or Rating	Operator(s)	Description
	Part 1: Datasets with ground truth and workers' answers
	Fact Evaluation Judgment Dataset	42624, 216725, 5.08	576	Fact Evaluation	Text	3 choose 1	Selection, Join	The task is to identify whether a fact (e.g., "Stephen Hawking graduated from Oxford") is correct, wrong, or ambiguous.
	Fashion 10000: An Enriched Social Image Dataset for Fashion and Clothing	32398, 97194, 3	32398	Image Retrieval	Image with Metadata (e.g., Tags)	2 choose 1	Selection, Collection, Join	The task identifies whether or not an image is fashion-related (Note: the downloaded dataset size is 9.8G).
	Sentiment Popularity	500, 10000, 20	500	Sentiment Analysis	Text	2 choose 1	Selection, Join	The task aims at classifying movie reviews as either positive or negative.
	Weather Sentiment	300, 6000, 20	300	Sentiment Analysis	Text	5 choose 1	Selection, Join	The task is to judge the sentiment of a tweet discussing the weather ("negative", "neutral", "positive", "unrelated to weather" or "cannot tell").
	Face Sentiment	584, 5256, 9	584	Face Sentiment Identification	Image	4 choose 1	Selection, Categorize, Join	The task is to identify the sentiment (on whether it is neutral, happy, sad or angry) for a given face image.
	Relevance Finding	20232, 98453, 4.87	3276	Relevance Finding	Text	Rating (5 choices)	Selection, Categorize, Sort, Top-K, Join	The task is to identify the relevance of a given topic and a given document in a 5-level rating, i,e, 2: highly relevant, 1: relevant, 0: non-relevant, -1: unknown, -2: broken link.
	Dunchenne Smile Identification	2162, 30319, 14.02	160	Dunchenne Smile Identification	Image	2 choose 1	Selection, Categorize, Join	The task is to judge whether or not a smile (on an image) is a Dunchenne Simile.
	HITSpam-Crowdflower	5380, 42762, 7.95	101	Spam Detection	Text	2 choose 1	Selection, Join	The task is to judge whether or not a HIT is a spam task.
	HITSpam-Mturk	5840, 28354, 4.86	101	Spam Detection	Text	2 choose 1	Selection, Join	The task is to judge whether or not a HIT is a spam task.
	Query Document Relevance	2165, 17395, 8.03	2165	Relevance Finding	Text	2 choose 1	Selection, Join	The task is to identify wehther a given query and a given document is relevant or not.
	AdultContent	11040, 92721, 8.40	1517	Classification	Text	Rating (5 choices)	Selection, Categorize, Sort, Top-K, Join	The task is to identify the adult level of a website (G, P, R, X, B).
	Emotion	700, 7000, 10	700	Emotion Rating	Text	Rating (choose a value from -100 to 100)	Selection, Sort, Top-K, Join	The task is to rate the emotion of a given text. There are 7 emotions (anger, disgust, fear, joy, sadness, surprise, valence), and a user gives a value from -100 to 100 for each emotion about the text.
	Word pair similarity	30, 300, 10	30	Word Similarity Finding	Text	Rating (choose a numerical score from 0 to 10)	Selection, Sort, Top-K, Join	The task is to assign a numerical similarity score between 0 and 10 to a given text.
	Recognizing Textual Entailment	800, 8000, 10	800	Textual Understanding	Text	2 choose 1	Selection, Join	The task is to identify whether a given Hypothesis sentence is implied by the information in the given text.
	Temporal Ordering	462, 4620, 10	462	Event Ordering	Text	2 choose 1	Selection, Sort, Top-K, Join	The task is to identify whether or not one event happens before another event in a given context.
	Word Sense Disambiguation	177,1770, 10	177	Word Sense Disambiguation	Text	K choose 1	Selection, Join	The task is to choose the most appropriate sense of a word (out of several given senses) in the given context.
	Web search data	2665, 15567, 5.84	2652	Relevance Finding	Text	Rating (5 choices)	Selection, Sort, Top-K, Join	The task is to judge the relevance of query-URL pairs with a 5-level rating scale (from 1 to 5).
	Duck	108, 4212, 39	108	Duck Identification	Image	2 choose 1	Selection, Join	The task is to identify whether the image contains a duck or not.
	Dog	807, 8070, 10	807	Dog Breed Identification	Image	4 choose 1	Selection, Categorize, Join	The task is to recognize a breed (out of Norfolk Terrier, Norwich Terrier, Irish Wolfhound, and Scottish Deerhound) for a given dog.
	Airline Twitter sentiment	55000,16000,3.4375	16,000	Sentiment Analysis	Text	K choose 1	Selection, Join	A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").
	Company categorizations (with URLs)	28675,7152,4	7,152	Categorization	Text	6 choose 1	Categorization	A data set where business names were matched with URLs/homepages for the named businesses.

	Part 2: Datasets with only ground truth ( no workers' answers )
	Clothing pattern identification	14,750	14,750	Classification	Image	K choose 1	Selection, Categorize	A large dataset where contributors viewed pictures of dresses and classified their patterns. Sixteen popular pattern types were provided and each dress was judged by three contributors. Dataset is provided with their aggregated judgments and URLs for each dress.
	Sound detection and classification	8,000	8,000	Classification	Audio	K choose 1	Selection, Categorize	Contributors listened to short audio clips and identified white noise events like coughing, dropped keys, and barking dogs. They also tried to identify the scene, such as office, cafe, or supermarket and ranked the difficulty of each individual row. Audio clips range from about five to ten seconds.
	Relevancy of terms to a disaster relief topic	7,566	7,566	Relevance Finding	Text	5 choose 1	Selection	Contributors viewed a topic and a term and rated the relevancy of the latter to the former on a five point scale (1 being very irrelevant, 5 being very relevant). The topics all center around humanitarian aid or disaster relief and each topic was defined for contributors. They were also asked if the term was a specific person or place and whether it was misspelled.
	Hate speech identification	14,442	14,442	Hate Sentiment Identification	Text	3 choose 1	Selection, Categorize, Join	Contributors viewed short text and identified if it a) contained hate speech, b) was offensive but without hate speech, or c) was not offensive at all. Contains nearly 15K rows with three contributor judgments per text string.
	Economic News Article Tone and Relevance	8,000	8,000	Relevance Finding	Text	K choose 1	Selection, Categorize, Join	Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was. Tone was judged on a 9 point scale (from 1 to 9, with 1 representing the most negativity). Dataset contains these judgments as well as the dates, source titles, and text. Dates range from 1951 to 2014.
	Football Strategy	3,731	3,731	Football Strategy	Text	6 choose 1	Selection	Contributors were presented a football scenario and asked to note what the best coaching decision would be. An scenario: "It is third down and 3. The ball is on your opponent's 20 yard line. There are five seconds left. You are down by 4." The decisions presented were punt, pass, run, kick a field goal, kneel down, or don't know. There are thousands of such scenarios in this job.
	Numerical Transcription from Images	7,665	7,665	Numerical Transcription from Images	Image	Numerical Transcription	Selection	Contributors looked at a series of pictures from a footrace and transcribed bib numbers of the competitors. Some images contain multiple bib numbers or incomplete bib numbers.
	Identifying key phrases in text	8,262	8,262	Textual Understanding	Text	K choose 1	Selection, Join	Contributors looked at question/answer pairs (like "When did Bob Marley die? 1981") and a series of sentences surrounding that event (such as phrases from a Bob Marley obituary). They marked which of those sentences spoke directly to the question (such as "Robert Nesta "Bob" Marley, OM (6 February 1945 – 11 May 1981) was a Jamaican reggae singer, song writer, musician") for a wide variety of topics.
	Gender classifier data	20,000	20,000	Gender Identification	Text/Image	3 choose 1	Selection, Categorize, Join	This data set was used to train a CrowdFlower AI gender predictor. Contributors were asked to simply view a Twitter profile and judge whether the user was a male, a female, or a brand (non-individual). The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.
	Sentiment analyses of single words or short phrases	3,523	3,523	Sentiment Analysis	Text	K choose 2	Selection, Join	Contributors looked at four words or bigrams (bigrams are just word pairs) and ranked the most positive and most negative ones in each set. For example, they saw quartets like "nasty, failure, honored, females" and chose which word was the most positive and most negative. Interestingly, each set was graded by eight contributors instead of the usual three. Dataset contains all 3,523 rows, but has 28K judgments.
	Disasters on social media	10,877	10,877	Fact Evaluation	Text	2 choose 1	Selection, Join	Contributors looked at over 10,000 tweets culled with a variety of searches like "ablaze", "quarantine", and "pandemonium", then noted whether the tweet referred to a disaster event (as opposed to a joke with the word or a movie review or something non-disastrous).
	Do these chemicals contribute to a disease?	5,160	5,160	Fact Evaluation	Text	2 choose 1	Selection, Join	Contributors read sentences in which both a chemical (like Aspirin) and a disease (or side-effect) were present. They then determined if the chemical directly contributed to the disease or caused it. Dataset includes chemical names, disease name, and aggregated judgments of five (as opposed to the usual three) contributors.
	First GOP debate sentiment analysis	14,000	14,000	Sentiment Analysis	Text	K choose 1	Selection, Join	We looked through tens of thousands of tweets about the early August GOP debate in Ohio and asked contributors to do both sentiment analysis and data categorization. Contributors were asked if the tweet was relevant, which candidate was mentioned, what subject was mentioned, and then what the sentiment was for a given tweet. We've removed the non-relevant messages from the uploaded dataset.
	URL categorization	31,085	31,085	URL categorization	Text	K choose 1	Selection, Categorize, Join	To create this large, enriched dataset of categorized websites, contributors clicked provided links and selected a main and sub-category for URLs.
	Classification of political social media	5,000	5,000	Classification	Text	K choose 1	Selection, Categorize, Join	Contributors looked at thousands of social media messages from US Senators and other American politicians to classify their content. Messages were broken down into audience (national or the tweeter's constituency), bias (neutral/bipartisan, or biased/partisan), and finally tagged as the actual substance of the message itself (options ranged from informational, announcement of a media appearance, an attack on another candidate, etc.)
	eCommerce search relevance	32,000	32,000	Relevance Finding	Text	Rating	Selection, Categorize	We used this dataset to launch our Kaggle competition, but the set posted here contains far more information than what served as the foundation for that contest. This set contains image URLs, rank on page, description for each product, search query that lead to each result, and more, each from five major English-language ecommerce sites.
	Housing and wheelchair accessibility	10,00	10,00	Fact Evaluation	Image	Label	Selection, Join	Here, contributors viewed 10,000 Google maps images and marked whether they were residential areas. If they were, they noted which homes were most prevalent in the area (apartments or houses) and whether the area had proper sidewalks that are wheelchair friendly.
	Primary emotions of statements	2,400	2,400	Sentiment Analysis	Text	18 choose 1	Selection, Join	Contributors looked at a single sentence and rated it's emotional content based on Plutchik's wheel of emotions. 18 emotional choices were presented to contributors for grading.
	U.S. economic performance based on news articles	5,000	5,000	Relevance Finding	Text	Rating(Rating the indication on a scale of 1-9, with 1 being negative and 9 being positive.)	Selection, Categorize	Contributors viewed a new article headline and a short, bolded excerpt of a sentence or two from the attendant article. Next, they decided if the sentence in question provided an indication of the U.S. economy's health, then rated the indication on a scale of 1-9, with 1 being negative and 9 being positive.
	Police-involved fatalities since May 2013	2,355	2,355	Classification	Text/Image	K choose 1	Selection, Categorize	A data categorization job where contributors compiled a database of police-involved shootings over a two-year span. Information contained includes: race, gender, city, state, whether the victim was armed, photos of the deceased, attending news stories, and more.
	Comparing pictures of people	59,476	59,476	Image Retrieval	Image	Rating (5 choices)	Selection, Collection, Join	In this job, contributors viewed two pictures of people walking through the same room and were then asked to compare the person on the left to the person on the right. Questions center on observable traits (like skin color, hair length, muscularity, etc.).
	Twitter sentiment analysis: Self-driving cars	7,015	7,015	Sentiment Analysis	Text	Rating( very positive, slightly positive, neutral, slightly negative, or very negative.)	Selection, Join	A simple twitter sentiment analysis job where contributors read tweets and classified them as very positive, slightly positive, neutral, slightly negative, or very negative. They were also prompted asked to mark if the tweet was not relevant to self-driving cars.
	Blockbuster database	410	410	Categorization	Text	Survey	Selection, Categorize	A data categorization job where we asked the crowd to find out information about the ten most popular movies, each year, for the past 40 years (1975-2015).
	Government official database	5,000	5,000	Categorization	Text	K choose 1	Categorization	A simple data categorization job wherein contributors viewed a cabinet member, minister, ambassador, etc., and separated their names from their titles. Data set contains names, positions, and years served.
	Wikipedia image categorization	976	976	Image Categorization	Image	K choose 1	Selection, Collection, Join	This data set contains hundreds of Wikipedia images which contributors categorized in the following ways:No person present/One person present/Several people present, but one dominant/Several people present, but none are dominant/Unsure. If the images were of one or several people, contributors further classified images by gender.
	Image attribute tagging	3,235	3,235	Image Categorization	Image	K choose 1	Selection, Collection, Join	Contributors viewed thousands of images and categorized each based on a given list of attributes. These attributes ranged from objective and specific (like "child" or "motorbike") to more subjective ones (like "afraid" or "beautiful"). Data set includes URLs for all images, multiple tags for each, and contributor agreement scores.
	Mobile search relevance	647	647	Relevance Finding	Text	K choose 1	Selection, Categorize, Join	Contributors viewed a variety of searches for mobile apps and determined if the intent of those searches was matched. One was a short query like "music player"; the other, a much longer one like "I would like to download an app that plays the music on the phone from multiple sources like Spotify and Pandora and my library."
	Progressive issues sentiment analysis	1,159	1,159	Sentiment Analysis	Text	K choose 1	Selection, Join	Contributors viewed tweets regarding a variety of left-leaning issues like legalization of abortion, feminism, Hillary Clinton, etc. They then classified if the tweets in question were for, against, or neutral on the issue (with an option for none of the above). After this, they further classified each statement as to whether they expressed a subjective opinion or gave facts.
	Indian terrorism deaths database	27,233	27,233	Fact Evaluation	Text	Survey	Selection, Join	Contributors read sentences from the South Asia Terrorism Portal and quantified them. Contributors counted the deaths mentioned in a sentence and whether they were terrorists, civilians, or security forces. Database contains original sentences, state and district in which the deaths occurred, dates of the deaths, and more. (Test questions have been removed from the database for ease of visualization.)
	Drug relation database	2,020	2,020	Fact Evaluation	Text	K choose 1	Selection, Join	Contributors read color coded sentences and determined what the relationship of a drug was to certain symptoms or diseases.
	Blurry image comparison	511	511	Relevance Finding	Image	K choose 1	Selection, Categorize, Join	Contributors viewed a pair of purposely blurry or saturated images. They were then asked which image more closely matched a particular word. Data set contains URLs for all images and image pairs, aggregated agreement scores, and variance amounts. Notably, a high number of contributors were polled for each image pairing (20 in total for each, giving this data set upwards of 10,000 judgements).
	Objective truths of sentences/concept pairs	8,227	8,227	Fact Evaluation	Text	Rating (5 choices)	Selection, Join	Contributors read a sentence with two concepts. For example "a dog is a kind of animal" or "captain can have the same meaning as master." They were then asked if the sentence could be true and ranked it on a 1-5 scale. On the low end was "strongly disagree" and on the upper, "strongly agree."
	Image sentiment polarity classification	15,613	15,613	Image Sentiment Analysis	Image	5 choose 1	Selection, Join	This data set contains over fifteen thousand sentiment-scored images. Contributors were shown a variety of pictures (everything from portraits of celebrities to landscapes to stock photography) and asked to score the images on typical positive/negative sentiment. Data set contains URL of images, sentiment scores of highly positive, positive, neutral, negative, and highly negative, and contributor agreement.
	Smart phone & tablet names database	1,600	1,600	Fact Evaluation	Text	2 choose 1	Selection, Join	Contributors viewed a particular model code (like C6730 or LGMS323), then searched for the name of the device itself (Kyocera C6730 Hydro or LG Optimus L70), then noted whether the device was a phone or tablet.
	Free text object descriptions	1,225	1,225	Image Retrieval	Text	Description	Selection, Collection, Join	Contributors viewed a pair of items and were asked to write sentences that describe and differentiated the two objects. In other words, if viewing an apple and a orange, they could not write "this is a piece of fruit" twice, but needed to note how they were different. Image pairings varied so that the same image would appear in different couples and the second image was always smaller. Data set contains URLs of images and three sentences written per item, per image.
	News article / Wikipedia page pairings	3,000	3,000	Relevance Finding	Text	2 choose 1	Selection, Categorize, Join	Contributors read a short article and were asked which of two Wikipedia articles it matched most closely. For example, a brief biography of Mel Gibson could be paired with Gibson's general Wikipedia page or Lethal Weapon; likewise, Iran election results could be paired with a Wikipedia page on Iran in general or the 2009 protests. Data set contains URLs for both Wiki pages, the full text contributors read, and their judgements on each row.
	Is-A linguistic relationships	3,297	3,297	Fact Evaluation	Text	2 choose 1	Selection, Join	Contributors were provided a pair of concepts in a constant sentence structure. Namely: [Noun 1] is a [noun 2]. They were then asked to simply note if this sentence was then true or false. Data set contains all nouns and aggregated T/F judgements.
	Weather sentiment	1000	1000	Sentiment Analysis	Text	K choose 1	Selection, Join	Here, contributors were asked to grade the sentiment of a particular tweet relating to the weather. The catch is that 20 contributors graded each tweet. We then ran an additional job (the one below) where we asked 10 contributors to grade the original sentiment evaluation.
Weather sentiment evaluated	1000	1000	Sentiment Analysis	Text	K choose 1	Selection, Join	Here, contributors were asked if the crowd graded the sentiment of a particular tweet relating to the weather correctly. The original job (above this one, called simply "Weather sentiment") involved 20 contributors noting the sentiment of weather-related tweets. In this job, we asked 10 contributors to check that original sentiment evaluation for accuracy.
Image classification: People and food	587	587	Image classification	Image	K choose 1	Selection, Categorize, Join	A collection of images of people eating fruits and cakes and other foodstuffs. Contributors classified the images by male/female, then by age (adult or child/teenager).
"All oranges are lemons," a.k.a. Semantic relationships between two concepts	3,536	3,536	Fact Evaluation	Text/Image	2 choose 1	Selection, Join	An interesting language data set about the relationship of broad concepts.All questions were phrased in the following way: "All [x] are [y]." For example, a contributor would see something like "All Toyotas are vehicles" and were then asked to say whether this claim was true or false. Contributors were also provided images, in case they were unclear as to what either concept is. This data set includes links to both images provided, the names given for [x] and [y], and whether the statement that "All [x] are [y]" was true or false.
The colors of #TheDress	1,000	1,000	Image Retrieval	Image	2 choose 1	Selection, Collection, Join	On February 27th, 2015, the internet was briefly obsessed with the color of a dress known simply as #TheDress. We ran a survey job to 1000 contributors and asked them what colors the dress was, as well as looked into a hypothesis that Night Owls and Morning People saw the dress differently.
McDonald's review sentiment	1,500	1,500	Sentiment Analysis	Text	8 choose 1	Selection, Join	A sentiment analysis of negative McDonald's reviews. Contributors were given reviews culled from low-rated McDonald's from random metro areas and asked to classify why the locations received low reviews.
Gender breakdown of Time Magazine covers	<100	<100	Gender Identification	Image	2 choose 1	Selection, Categorize, Join	Contributors were shown images of Time Magazine covers since the late 1920s and asked to classify if the person was male or female. Data is broken down overall and on an annual basis.
Agreement between long and short sentences	2,000	2,000	Relevance Finding	Text	3 choose 1	Selection, Categorize, Join	Contributors were asked to read two sentences (the first was an image caption and the second was a shorter version) and judge whether the short sentence adequately describes the event in the first sentence (image caption).
Biomedical image modality	10,652	10,652	Image classification	Image	K choose 1	Selection, Categorize, Join	A large data set of labeled biomedical images, ranging from x-ray and ultrasound to charts, graphs, and even hand-drawn sketches.
Academy Awards demographics	416	416	Fact Retrieval	Text/Image	K choose 1	Selection, Categorize, Sort, Top-K, Join	A data set concerning the race, religion, age, and other demographic details of all Oscars winners since 1928 in the following categories:Best Actor/Best Actress/Best Supporting Actor/Best Supporting Actress/Best Director.
Corporate messaging	3,118	3,118	Categorization	Text	3 choose 1	Categorization	A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it's activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.).
Body part relationships	1,892	1,892	Fact Evaluation	Text	2 choose 1	Selection, Join	A data set where contributors classified if certain body parts were part of other parts. Questions were phrased like so: "[Part 1] is a part of [part 2]," or, by way of example, "Nose is a part of spine" or "Ear is a part of head."
Wearable technology database	582	582	Fact Evaluation	Text	K choose 1	Selection, Categorize, Sort, Top-K, Join	A data set containing information on hundreds of wearables. Contains data on prices, company name and location, URLs for all wearables, as well as the location of the body on which the wearable is worn.
Image descriptions	225,000	225,000	Image description	Image	2 choose 1	Selection, Categorize, Join	Contributors were shown a large variety of images and asked whether a given word described the image shown. For example, they might see a picture of Mickey Mouse and the word Disneyland, where they'd mark "yes." Conversely, if Mickey Mouse's pair word was "oatmeal," they would mark no.
Sentence plausibility	400	400	Fact Evaluation	Text	Rating (5 choices)	Selection, Categorize, Join	Contributors read strange sentences and ranked them on a scale of "implausible" (1) to "plausible" (5). Sentences were phrased in the following manner: "This is not an [x], it is a [y]."
Coachella 2015 Twitter sentiment	3,847	3,847	Sentiment Analysis	Text	K choose 1	Selection, Join	A sentiment analysis job about the lineup of Coachella 2015.
Apple Computers Twitter sentiment	3,969	3,969	Sentiment Analysis	Text	K choose 1	Selection, Join	Contributors were given a tweet and asked whether the user was positive, negative, or neutral about Apple. (They were also allowed to mark "the tweet is not about the company Apple, Inc.)
How beautiful is this image? (Part 1: People)	3,500	3,500	Image classification	Image	Rating(They were given a five-point scale, from "unacceptable" (blurry, red-eyed images) to "exceptional" (hi-res, professional-quality portraiture).)	Selection, Categorize, Join	Here, contributors were asked to rate image quality (as opposed to how pretty the people in the images actually are). They were given a five-point scale, from "unacceptable" (blurry, red-eyed images) to "exceptional" (hi-res, professional-quality portraiture) and ranked a series of images based on that criteria.
How beautiful is this image? (Part 2: Buildings and Architecture)	3,500	3,500	Image classification	Image	Rating( They were given a five-point scale, from "unacceptable" (out-of-focus cityscapes) to "exceptional" (hi-res photos that might appear in a city guide book) .)	Selection, Categorize, Join	Here, contributors were asked to rate image quality (as opposed to how gorgeous the buildings in the images actually are). They were given a five-point scale, from "unacceptable" (out-of-focus cityscapes) to "exceptional" (hi-res photos that might appear in a city guide book) and ranked a series of images based on that criteria.
How beautiful is this image? (Part 3: Animals)	3,500	3,500	Image classification	Image	Rating( They were given a five-point scale, from "unacceptable" (blurry photos of pets) to "exceptional" (hi-res photos that might appear in text books or magazines).)	Selection, Categorize, Join	Here, contributors were asked to rate image quality (as opposed to how adorable the animals in the images actually are). They were given a five-point scale, from "unacceptable" (blurry photos of pets) to "exceptional" (hi-res photos that might appear in text books or magazines) and ranked a series of images based on that criteria.
Language: Certainty of Events	13,386	13,386	Sentiment Analysis	Text	K choose 1	Selection, Join	A linguistical data set concerning the certainly an author has about a certain word. For example, in the following sentence: "The dog ran out the door," if the word "ran" was asked about, the certainty that the event did or will happen would be high.
New England Patriots Deflategate sentiment	11,814	11,814	Sentiment Analysis	Text	K choose 1	Selection, Join	Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal.
Sports Illustrated covers	32,000	32,000	Image Retrieval	Image	K choose 1	Selection, Collection, Join	A data set listing the sports that have been on the cover of Sports Illustrated since 1955.
The data behind data scientists	974	974	Fact Retrieval	Text	Survey	Collection	A look into what skills data scientists need and what programs they use. A part of our 2015 data scientist report which you can download.
Judge the relatedness of familiar words and made-up ones	300	300	Relevance Finding	Text	Rating(1-5, from completely unrelated to very strongly related, respectively.)	Selection, Categorize, Join	Contributors were given a nonce word and a real word, for example, "leebaf" and "iguana." They were given a sentence with the nonce word in it and asked to note how related the nonce word and real word were. Here's a sample question: "Large numbers of leebaf skins are exported to Latin America to be made into handbags, shoes and watch straps." Contributors then ranked the relation of "leebaf" to "iguana" on a scale of 1-5, from completely unrelated to very strongly related, respectively.
2015 New Year's resolutions	5,011	5,011	Sentiment Analysis	Text	K choose 1	Selection, Join	A Twitter sentiment analysis of users' 2015 New Year's resolutions. Contains demographic and geographical data of users and resolution categorizations.
Smart phone app functionality	1,898	1,898	Fact Retrieval	Text	K choose 1	Collection	Contributors read an app description, then selected the app's functionality from a pre-chosen list. Functionalities ranged from SMS to flashlight to weather to whether or not they used a phone's contacts. Contributors were allowed to select as many functionalities as applied for each app.
Naturalness of computer generated images	600	600	Image classification	Image	K choose 1	Selection, Categorize, Join	Contributors viewed two rather bizarre looking images and were asked which was more "natural." Images were all computer generated faces of people in various states of oddness.
National Park locations	323	323	Fact Retrieval	Text	Label	Collection	A large data set containing the official URLs of United States national and state parks.
Colors in 9 Languages	4,000	4,000	Fact Retrieval	Text	K choose 1	Collection	Dataset of 4000 crowd-named colors in 9 languages. Includes the RGB color, the native language color, and the translated color.
Judge emotions about nuclear energy from Twitter	190	190	Sentiment Analysis	Text	5 choose 1	Selection, Join	This dataset is a collection of tweets related to nuclear energy along with the crowd's evaluation of the tweet's sentiment. The possible sentiment categories are: "Positive", "Negative", "Neutral / author is just sharing information", "Tweet NOT related to nuclear energy", and "I can't tell". We also provide an estimation of the crowds' confidence that each category is correct which can be used to identify tweets whose sentiment may be unclear.
Decide whether two English sentences are related	555	555	Relevance Finding	Text	Rating (5 choices)	Selection, Categorize, Join	This dataset is a collection of English sentence pairs. The crowd was asked about the truth value of the second sentence if the first sentence were true and to what extent the sentences are related on a scale of 1 to 5. The variance of this score over the crowd's judgments is included as well.
Similarity judgement of word combinations	6,274	6,274	Relevance Finding	Text	7 choose 1	Selection, Categorize, Join	Contributors were asked to evaluate how similar are two sets of words on a seven point scale with 1 being "completely different" and 7 being "exactly the same".
Sentiment Analysis - Global Warming/Climate Change	6,090	6,090	Sentiment Analysis	Text	3 choose 1	Selection, Join	Contributors evaluated tweets for belief in the existence of global warming or climate change. The possible answers were "Yes" if the tweet suggests global warming is occuring, "No" if the tweet suggests global warming is not occuring, and "I can't tell" if the wtweet is ambiguous or unrelated to global warming. We also provide a confidence score for the classification of each tweet.
Judge Emotion About Brands & Products	9,093	9,093	Sentiment Analysis	Text	3 choose 1	Selection, Join	Contributors evaluated tweets about multiple brands and products. The crowd was asked if the tweet expressed positive, negative or no emotion towards a brand and/or product. If some emotion was expressed they were also asked to say which brand or product was the target of that emotion.
Claritin Twitter	4,900	4,900	Relevance Finding	Text	K choose 1	Selection, Categorize	This dataset has all tweets that mention Claritin for October, 2012. The tweets are tagged with sentiment, the author's gender, and whether or not they mention any of the top 10 adverse events reported to the FDA.