data_tooling
data_tooling copied to clipboard
Create dataset UIT-ViHSD
- uid: UIT-ViHSD
- type: processed
- description:
- name: Vietnamese Hate Speech Detection Dataset
- description: In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social media, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD – a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.
- homepage: https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects#h.fs21gpd5w6p1
- validated: True
- languages:
- language_names:
- Vietnamese
- language_comments:
- language_locations:
- South-eastern Asia
- Vietnam
- validated: False
- language_names:
- custodian:
- name: Mr. Son Luu
- in_catalogue:
- type: A university or research institution
- location: Vietnam
- contact_name: Mr. Son Luu
- contact_email: [email protected]
- contact_submitter: False
- additional:
- validated: False
- availability:
- procurement:
- for_download: No - but the current owners/custodians have contact information for data queries
- download_url:
- download_email: [email protected]
- licensing:
- has_licenses: Unclear
- license_text:
- license_properties:
- license_list:
- pii:
- has_pii: Unclear
- generic_pii_likely:
- generic_pii_list:
- numeric_pii_likely:
- numeric_pii_list:
- sensitive_pii_likely:
- sensitive_pii_list:
- no_pii_justification_class: general knowledge not written by or referring to private persons
- no_pii_justification_text:
- validated: False
- procurement:
- processed_from_primary:
- from_primary: Taken from primary source
- primary_availability: No - the dataset curators kept the source data secret
- primary_license:
- primary_types:
- validated: False
- media:
- category:
- text
- text_format:
- audiovisual_format:
- image_format:
- database_format:
- text_is_transcribed: No
- instance_type:
- instance_count:
- instance_size:
- validated: False
- category:
- fname: UIT-ViHSD.json