Recent advances in crowdsourcing technologies enable computationally challenging tasks (e.g., sentiment analysis and entity resolution) to be performed by Internet workers. Upon finishing a task, the worker is given a monetary reward. A fundamental question is: how should workers be selected, so that the tasks in hand can be accomplished successfully and economically? In this paper, we study the Jury Selection Problem, or JSP: Given a monetary budget, and a set of decision-making tasks (e.g., “Is Bill Gates the CEO of Microsoft now?”), return the set of workers (called jury), such that their answers yield the highest “Jury Quality” (or JQ). Existing JSP solutions make use of the Majority Voting (MV) strategy, which decides the answer chosen by the largest number of jurors. We show that MV does not yield the best solution for JSP. We further show that among various voting strategies, Bayesian Voting (BV) facilitates JSP to be answered optimally. We then examine how to solve JSP based on BV. This is technically challenging, since computing the JQ with BV is NP-hard. We solve this problem by proposing an approximate algorithm that is computationally efficient. We extend our solution by considering the task owner’s “belief” (or prior) on the answers of the tasks. Experiments on synthetic and real datasets show that our new approach is consistently better than the best JSP solution known. Our approximate JQ computation algorithm is also highly accurate, yielding an error of 1% or less.
The above figure illustrates our crowdsourcing system, which we called the “Optimal Jury Selection System”. In this system, the task provider published a decision-making tasks. Then, based on the the jurors’ information (i.e., their quality and cost requirements), a “budget-quality table” is generated. In this table, each row contains the value of a budget, the optimal jury set, and its quality and the required budget for the optimal jury set. Based on this table, the task provider can conveniently decide the best budget-quality combination.
For example, she may deem that increasing the budget from 15 units to 20 units is not worthwhile, since the quality increases by around 2.5% only. In this example, the task provider selects the jury set {B,F,G} that is the best under a budget of 15 units. The chosen jury set costs her 14 units. Another interesting observation is that a task provider can now allocate the budget more precisely. In this example, it is not necessary for a provider to use 15 units of budget; she only needs 14 units to achieve the same quality.
For the Jury Quality (JQ), as it is closely related to voting strategies and given a jury set, different strategies may have different JQ. We first systematically classify all strategies into two categories (Page 1), and then prove the existence of the optimal strategy, which is the Bayesian Voting (BV) Strategy (Page 2). Even though computing JQ for BV is NP-hard (Page 3), we give a bucket-based approximation algorithm (Page 4), and the approximation error bound of the polynomial algorithm is bounded within 1% (Page 5).
Before formally addressing the JSP for BV, we first prove two monotonicity properties which help us to deal with the conditions that jurors have specific cost requirements.
A direct consequence of monotonicity on jury size is that “the more jurors, the better JQ for BV”. So for the case that each juror will contribute voluntarily or the budget constraint satisfies on all candidate jurors in W, we can select all jurors in W.
Intuitively, monotonicity on juror quality tells us that a juror with higher quality contributes not less in JQ compared with a lower quality juror. Thus for the case that each juror has the same cost requirement c, then we can select the top-k jurors by their quality, where k is constrained by B, c and N.
For a more general case, that is, there is no constraint on juror quality, we can prove that solving it is NP-hard. To formally address it, we apply the simulated annealing heuristic, a well-known technique which is successful in many discrete combinatorial optimization problems and develop local search technique to support it.
Previously we have talked about how to solve JSP under two assumptions:
1) it is decision-making task with binary answer, and
2) each juror’s quality is modelled as a constant.
However, in reality, it is common for task provider to ask multiple label tasks.
For example, sentiment analysis tasks require jurors to label the sentiment (positive, neutral, or negative) of each task. An example is as below:
In addition, the juror’s quality can be modelled by measuring the sensitivity and specificity of each possible answer, or a more general confusion matrix (CM). Specifically, a confusion matrix is a matrix C of size "L*L" where L is the number of labels and each element Cjk encodes the probability that the juror gives label k as an answer when the true label is j. An example CM with L=2 is as follows:
We formally extend our solution for more general task and juror model by addressing the following three problems:
(a) whether the optimal strategy (or BV) is still optimal or not;
(b) how to compute JQ for BV by extending the bucket-based approximation algorithm;
(c) how to solve the JSP, and whether the two monotonicity properties still hold or not.
*Interested Reader can refer to the full version (in "Works" section below) for more details.
We perform End-to-End system experiment with the existing work. Since [1] "C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11):1495–1506, 2012" is the only existing work we are aware of, who formally addresses JSP for Majority Voting strategy, we call it Majority Voting Jury Selection (MVJS). We similarly denote our system as "BVJS" and we perform system comparison on both synthetic data and real dataset collected from AMT (which provides more statistics on exact juror quality, so we do not have to vary the juror quality as what the synthetic data does).
Experiments on Synthetic data:
Experiments on Real data (note that you can contact us to get the real dataset, see the "About" section below):
Experiments show that BVJS are consistently better than MVJS on both synthetic and real datasets by varying parameters.
Yudian Zheng, Reynold Cheng, Silviu Maniu, Luyi Mo. On Optimality of Jury Selection in Crowdsourcing.
In International Conference on Extending Database Technology (EDBT), 2015, March 23-27, Brussels, Belgium.
[Full Version] [Slides]
Optimal Jury Selection System is being developed by Yudian Zheng.
If you have any comments or questions,
please feel free to email us at: ydzheng2 [AT] cs.hku.hk.