crowd-kit
crowd-kit copied to clipboard
Control the quality of your labeled data with the Python tools you already know.
Crowd-Kit: Computational Quality Control for Crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
- metrics of uncertainty, consistency, and agreement with aggregate
- loaders for popular crowdsourced datasets
Installing
Installing Crowd-Kit is as easy as pip install crowd-kit.
Those who are interested in contributing to Crowd-Kit can use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.
Getting Started
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset.
df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then you can aggregate the worker responses as easily as in scikit-learn:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Implemented Aggregation Methods
Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).
Categorical Responses
| Method | Status |
|---|---|
| Majority Vote | ✅ |
| One-coin Dawid-Skene | ✅ |
| Dawid-Skene | ✅ |
| Gold Majority Vote | ✅ |
| M-MSR | ✅ |
| Wawa | ✅ |
| Zero-Based Skill | ✅ |
| GLAD | ✅ |
| KOS | ✅ |
| MACE | ✅ |
| BCC | 🟡 |
Textual Responses
| Method | Status |
|---|---|
| RASA | ✅ |
| HRRASA | ✅ |
| ROVER | ✅ |
| Language Model-Based | ✅ |
Image Segmentation
| Method | Status |
|---|---|
| Segmentation MV | ✅ |
| Segmentation RASA | ✅ |
| Segmentation EM | ✅ |
Pairwise Comparisons
| Method | Status |
|---|---|
| Bradley-Terry | ✅ |
| Noisy Bradley-Terry | ✅ |
Citation
- Ustalov D., Pavlichenko N., Losev V., Giliazev I., and Tulin E. A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track. HCOMP 2021. 2021. arXiv: 2109.08584 [cs.HC].
@inproceedings{HCOMP2021/CrowdKit,
author = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
title = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
year = {2021},
booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
series = {HCOMP~2021},
eprint = {2109.08584},
eprinttype = {arxiv},
eprintclass = {cs.HC},
url = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
language = {english},
}
Questions and Bug Reports
- For reporting bugs please use the Toloka/bugreport page.
- Join our English-speaking slack community for both tech and abstract questions.
License
© YANDEX LLC, 2020-2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
