A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform for a large number of questions to be answered by Internet workers. Such systems have been shown to be useful to solve problems that are difficult for computers, including entity resolution, sentiment analysis, and image recognition. In this project, we investigate the online task assignment problem: Given a pool of n questions, which of the k questions should be given to a worker? A poor assignment may not only waste time and money, but may also hurt the quality of a crowdsourcing application that depends on the workers’ inputs.
We propose to consider quality measures (also known as evaluation metrics) that are relevant to an application during the task assignment process. Particularly, we explore how Accuracy and F-score, two widely-used evaluation metrics for crowdsourcing applications, can facilitate task assignment. Since these two metrics assume that the ground truth of a question is known, we study their variants that make use of the probability distributions of workers’ answers. We further investigate online assignment strategies, which enables optimal task assignments. Since these algorithms are expensive, we propose solutions that attain high quality in linear time. We develop a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT. We evaluate our approaches on five real crowdsourcing applications. We found that QASCA is efficient, and attains better result quality (of more than 8% improvement) than existing methods.
Here is the project directory tree of QASCA.
QASCA is deployed on a Ubuntu 10.04.4 system, and in order to run our project, there are some required softwares/programming tools to install. Here are the followings (recommended version in the parenthesis):
Python (2.7.3), Django (1,5), Apache (2.2.24), mod_wsgi (3.4), MySQL (14.14), MySQL-python (1.2.3) and boto library.
Having finished installing the above required softwares, in order to deploy a real application, what you need is to
(1) configure the "config.ini" file in the publish folder, which contains the database, log and mturk information;
(2) create a new folder in the apps folder, and the new folder contains three main files to configure: the Questions file ("questions.json"), the HTML template file ("view.html", "accept.html"), and the Configuration file ("config.ini").
(a) "question.json" contains the questions needed to publish, and the questions are organized in a json format;
(b) "view.html" contains a static html file and the workers will see in the view mode at AMT;
(c) "accept.html" is a django template file and the workers will see when they accept a HIT at AMT;
(d) "config.ini" contains parameters related to your deployed app.
We will walk through an example application of QASCA, which shows how to deploy an implementation of entity resolution problem ([1] J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD Conference, pages 229-240, 2013) in our project.
In that paper [1], it leverages transitive relations to address the entity resolution problem. It adopts an iterative approach where in each iteration, it publishes a set of pairs (questions) and derive their results from the crowd, and then apply transitive rule to deduce other candidate pairs. It [1] can deploy the generated questions on our system in each iteration and use our system to get their results.
Suppose in the first iteration of the algorithm, it generates n = 1000 questions where each question has the labels “equal” and “non-equal”. The requester first creates an application folder in the APP Manager component, in the created folder, the requester needs to deploy three files: the Question file, the HTML template file, and the Configuration file.
(1) The Questions File ("question.json") contains questions that are of json-format, and one example file containing two questions is listed as follows:
(2) The HTML template file contains two files: "view.html" and "accept.html". One file is a static html that shows what the user will see in the view mode in the AMT, which gives some static examples of the questions that workers will answer. Another file is a dynamical django html file (you can resort to ). Here we show an example of "view.html":
(3) The Configuration file ("config.ini") can be specified that each HIT contains k = 10 questions and is paid b=$0.02, and the total number of assignments of HITs is set as m=400 (then each question will be answered m/(n/k) = 4 times on average), the evaluation metric is F-score for "equal" (alpha=0.5). Thus in the "config.ini" of the newly created application folder, the parameters should be set as follows:
After publishing all HITs by calling "publish.py" in publish folder, two processes will occur based on different requests:
HIT request process: When a worker requests a HIT , Web Server acquires the worker id from AMT and passes it to Task Assignment, which identifies k = 10 questions based on the specified evaluation metric (F-score for label "equal" where alpha=0.5) in APP Manager, and returns a HIT consisting of the identified 10 questions to the worker.
HIT completion process: When a worker completes a HIT, Web Server updates the answer set, the question and worker model. After obtaining the answers of all m=400 HITs, QASCA terminates and returns the result for each question based on the question model (in Database) and the specified evaluation metric (in App Manager).
Then after collecting the results from QASCA, the algorithm [1] will conduct transitive rules to the answered questions. Then it [1] comes to the second iteration and can again use our system for its generated questions.
We perform End-to-End system experiments using five real world datasets with two existing systems: CDAS ([2] X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. Cdas: A crowdsourcing data analytics system. PVLDB, 5(10):1040-1051, 2012) and Askit! ([3] R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. C. Tan. Asking the right questions in crowd data sourcing. In ICDE, pages 1261-1264, 2012). We also set a reasonable Baseline method, and two other methods (MaxMargin and ExpLoss) in the paper. You can refer to our paper below for more details.
The result quality for the first two applications are evaluated in Accuracy, and the datasets are FS (Film Poster, extracted from IMBD: [4] http://www.imdb.com/) and SA (Sentiment Analysis, extracted from a public dataset: [5] http://www.sananalytics.com/lab/twitter-sentiment/) :
The result quality for the other three applications are evaluated in F-score(with different alpha) and the datasets are ER (Entity Resolution, extracted from Abt-Buy: [6] http://dbs.uni-leipzig.de/file/Abt-Buy.zip), PSA (Positive Sentiment Analysis, extracted from [5]) and NSA (Negative Sentiment Analysis, extracted from [5]):
Here is the final result (when all HITs are finished) for all five applications, which shows that QASCA improves more than 8% compared with existing approaches (i.e., Baseline, CDAS, Askit!, MaxMargin and ExpLoss).
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng.
QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications.
[ bib ]
In SIGMOD 2015, Full Paper, Pages 1031-1046, May 31-June 4, Melbourne, Australia.
[Project Website]
[Slides]
[Poster]
Yudian Zheng, Guoliang Li, Reynold Cheng.
DOCS: A Domain-Aware Crowdsourcing System Using Knowledge Bases.
[ bib ]
In VLDB 2016, Vol 10, Issue 4, Pages 361-372, Full Paper, Present in VLDB 2017, Aug 28 - Sep 1, Munich, Germany.
We have open-sourced the implementations of online task assignment system on top of Amazon Mechanical Turk (github code).
If you have any comments or questions, please feel free to email us at: ydzheng2 [AT] cs.hku.hk.