data_tooling
data_tooling copied to clipboard
Create dataset titml_idn_speech_corpus
- uid: titml_idn_speech_corpus
- type: processed
- description:
- name: TITML-IDN speech corpus
- description: TITML-IDN contains Bahasa Indonesia speech data from 20 Indonesian speakers, 9. Each speaker was asked to read 343 phonetically balanced sentences most of which were selected from a text corpus.
- homepage: http://research.nii.ac.jp/src/en/TITML-IDN.html
- validated: True
- languages:
- language_names:
- Indonesian
- language_comments: formal
- language_locations:
- South-eastern Asia
- Indonesia
- validated: False
- language_names:
- custodian:
- name: Speech Resources Consortium, National Institute of Informatics
- in_catalogue:
- type: A university or research institution
- location: Japan
- contact_name: Speech Resources Consortium, National Institute of Informatics
- contact_email: [email protected]
- contact_submitter: False
- additional: http://research.nii.ac.jp/src/en/
- validated: False
- availability:
- procurement:
- for_download: No - but the current owners/custodians have contact information for data queries
- download_url:
- download_email: [email protected]
- licensing:
- has_licenses: Yes
- license_text: - “User” shall not execute any reproduction or modification of this corpus for sale or distribution to
the third party.
- This corpus shall be used only for research purpose.
- Reports or publications referring to the results of studies conducted on this corpus shall cite the “Speech Corpus Name” shown above as the source of the speech material. Copies of unclassified reports or publications referring to these studies shall be made available to “NII”, when requested.
- license_properties:
- research use
- license_list:
- pii:
- has_pii: No
- generic_pii_likely:
- generic_pii_list:
- numeric_pii_likely:
- numeric_pii_list:
- sensitive_pii_likely:
- sensitive_pii_list:
- no_pii_justification_class: general knowledge not written by or referring to private persons
- no_pii_justification_text:
- validated: False
- procurement:
- processed_from_primary:
- from_primary: Original data
- primary_availability:
- primary_license:
- primary_types:
- validated: False
- media:
- category:
- audiovisual
- text_format:
- audiovisual_format:
- .WAV
- image_format:
- database_format:
- text_is_transcribed:
- instance_type: audio file, 1 sentence per audio file
- instance_count: 1K<n<10K
- instance_size: 10<n<100
- validated: False
- category:
- fname: titml_idn_speech_corpus.json