data_tooling Create dataset titml_idn_speech

Create dataset titml_idn_speech_corpus

Open albertvillanova opened this issue 2 years ago • 0 comments

uid: titml_idn_speech_corpus
type: processed
description:
- name: TITML-IDN speech corpus
- description: TITML-IDN contains Bahasa Indonesia speech data from 20 Indonesian speakers, 9. Each speaker was asked to read 343 phonetically balanced sentences most of which were selected from a text corpus.
- homepage: http://research.nii.ac.jp/src/en/TITML-IDN.html
- validated: True
languages:
- language_names:
  - Indonesian
- language_comments: formal
- language_locations:
  - South-eastern Asia
  - Indonesia
- validated: False
custodian:
- name: Speech Resources Consortium, National Institute of Informatics
- in_catalogue:
- type: A university or research institution
- location: Japan
- contact_name: Speech Resources Consortium, National Institute of Informatics
- contact_email: [email protected]
- contact_submitter: False
- additional: http://research.nii.ac.jp/src/en/
- validated: False
availability:
- procurement:
  - for_download: No - but the current owners/custodians have contact information for data queries
  - download_url:
  - download_email: [email protected]
- licensing:
  - has_licenses: Yes
  - license_text: - “User” shall not execute any reproduction or modification of this corpus for sale or distribution to the third party.
    - This corpus shall be used only for research purpose.
    - Reports or publications referring to the results of studies conducted on this corpus shall cite the “Speech Corpus Name” shown above as the source of the speech material. Copies of unclassified reports or publications referring to these studies shall be made available to “NII”, when requested.
  - license_properties:
    - research use
  - license_list:
- pii:
  - has_pii: No
  - generic_pii_likely:
  - generic_pii_list:
  - numeric_pii_likely:
  - numeric_pii_list:
  - sensitive_pii_likely:
  - sensitive_pii_list:
  - no_pii_justification_class: general knowledge not written by or referring to private persons
  - no_pii_justification_text:
- validated: False
processed_from_primary:
- from_primary: Original data
- primary_availability:
- primary_license:
- primary_types:
- validated: False
media:
- category:
  - audiovisual
- text_format:
- audiovisual_format:
  - .WAV
- image_format:
- database_format:
- text_is_transcribed:
- instance_type: audio file, 1 sentence per audio file
- instance_count: 1K<n<10K
- instance_size: 10<n<100
- validated: False
fname: titml_idn_speech_corpus.json

Nov 23 '21 11:11 albertvillanova

data_tooling data_tooling copied to clipboard

Create dataset titml_idn_speech_corpus

data_tooling
data_tooling copied to clipboard