Uli icon indicating copy to clipboard operation
Uli copied to clipboard

Participatory Approaches to Building Datasets on Abuse

Open tarunima opened this issue 7 months ago • 2 comments

Description:

Automated approaches to abuse detection rely on annotated datasets. At least at present, unsupervised machine learning alone cannot detect abuse across languages. To fill the gap of abuse detection datasets in India languages, Tattle started the Uli project to specifically create datasets on gendered abuse in Indian languages.But the focus is also to take a survivor centered perspective on abuse. The datasets was created with people of marginalized genders at the receiving end of abuse. The first dataset on abusive tweets helped us develop a methodology for participatory datasets that we would now like to extend to more languages and modalities.

The Scope of This Task:

  1. Review literature about datasets of abuse detection in images, videos and audio.
  2. Create a dataset of images from social media that could be annotated by the existing community of researchers, survivors, activists.
  3. Expand the community of annotators
  4. Qualitative research to define abuse in multimodal datasets
  5. Organize annotations
  6. Release the dataset.

This ticket should be treated as a statement of intent for a multi-year project. If you're interested in collaborating on this project, please leave a comment.

tarunima avatar Jul 01 '24 09:07 tarunima