data_tooling icon indicating copy to clipboard operation
data_tooling copied to clipboard

Create dataset UIT-ViHSD

Open albertvillanova opened this issue 2 years ago • 0 comments

  • uid: UIT-ViHSD
  • type: processed
  • description:
    • name: Vietnamese Hate Speech Detection Dataset
    • description: In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social media, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD – a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.
    • homepage: https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects#h.fs21gpd5w6p1
    • validated: True
  • languages:
    • language_names:
      • Vietnamese
    • language_comments:
    • language_locations:
      • South-eastern Asia
      • Vietnam
    • validated: False
  • custodian:
    • name: Mr. Son Luu
    • in_catalogue:
    • type: A university or research institution
    • location: Vietnam
    • contact_name: Mr. Son Luu
    • contact_email: [email protected]
    • contact_submitter: False
    • additional:
    • validated: False
  • availability:
    • procurement:
      • for_download: No - but the current owners/custodians have contact information for data queries
      • download_url:
      • download_email: [email protected]
    • licensing:
      • has_licenses: Unclear
      • license_text:
      • license_properties:
      • license_list:
    • pii:
      • has_pii: Unclear
      • generic_pii_likely:
      • generic_pii_list:
      • numeric_pii_likely:
      • numeric_pii_list:
      • sensitive_pii_likely:
      • sensitive_pii_list:
      • no_pii_justification_class: general knowledge not written by or referring to private persons
      • no_pii_justification_text:
    • validated: False
  • processed_from_primary:
    • from_primary: Taken from primary source
    • primary_availability: No - the dataset curators kept the source data secret
    • primary_license:
    • primary_types:
    • validated: False
  • media:
    • category:
      • text
    • text_format:
    • audiovisual_format:
    • image_format:
    • database_format:
    • text_is_transcribed: No
    • instance_type:
    • instance_count:
    • instance_size:
    • validated: False
  • fname: UIT-ViHSD.json

albertvillanova avatar Nov 23 '21 10:11 albertvillanova