data_tooling icon indicating copy to clipboard operation
data_tooling copied to clipboard

Create dataset titml_idn_speech_corpus

Open albertvillanova opened this issue 2 years ago • 0 comments

  • uid: titml_idn_speech_corpus
  • type: processed
  • description:
    • name: TITML-IDN speech corpus
    • description: TITML-IDN contains Bahasa Indonesia speech data from 20 Indonesian speakers, 9. Each speaker was asked to read 343 phonetically balanced sentences most of which were selected from a text corpus.
    • homepage: http://research.nii.ac.jp/src/en/TITML-IDN.html
    • validated: True
  • languages:
    • language_names:
      • Indonesian
    • language_comments: formal
    • language_locations:
      • South-eastern Asia
      • Indonesia
    • validated: False
  • custodian:
    • name: Speech Resources Consortium, National Institute of Informatics
    • in_catalogue:
    • type: A university or research institution
    • location: Japan
    • contact_name: Speech Resources Consortium, National Institute of Informatics
    • contact_email: [email protected]
    • contact_submitter: False
    • additional: http://research.nii.ac.jp/src/en/
    • validated: False
  • availability:
    • procurement:
      • for_download: No - but the current owners/custodians have contact information for data queries
      • download_url:
      • download_email: [email protected]
    • licensing:
      • has_licenses: Yes
      • license_text: - “User” shall not execute any reproduction or modification of this corpus for sale or distribution to the third party.
        • This corpus shall be used only for research purpose.
        • Reports or publications referring to the results of studies conducted on this corpus shall cite the “Speech Corpus Name” shown above as the source of the speech material. Copies of unclassified reports or publications referring to these studies shall be made available to “NII”, when requested.
      • license_properties:
        • research use
      • license_list:
    • pii:
      • has_pii: No
      • generic_pii_likely:
      • generic_pii_list:
      • numeric_pii_likely:
      • numeric_pii_list:
      • sensitive_pii_likely:
      • sensitive_pii_list:
      • no_pii_justification_class: general knowledge not written by or referring to private persons
      • no_pii_justification_text:
    • validated: False
  • processed_from_primary:
    • from_primary: Original data
    • primary_availability:
    • primary_license:
    • primary_types:
    • validated: False
  • media:
    • category:
      • audiovisual
    • text_format:
    • audiovisual_format:
      • .WAV
    • image_format:
    • database_format:
    • text_is_transcribed:
    • instance_type: audio file, 1 sentence per audio file
    • instance_count: 1K<n<10K
    • instance_size: 10<n<100
    • validated: False
  • fname: titml_idn_speech_corpus.json

albertvillanova avatar Nov 23 '21 11:11 albertvillanova