ULCA-asr-dataset-corpus icon indicating copy to clipboard operation
ULCA-asr-dataset-corpus copied to clipboard

Need information on provenance of this data.

Open jeb-orcl opened this issue 2 years ago • 1 comments

Can you add information at the top of the README to explain how this repo has the rights to distribute this data under the CC Attribution license? Much of the material appears to be broadcast journalism, which is usually copyrighted.

I am from Oracle, and we would very much like to use this data in building speech recognition for Indian languages, but we must first verify that the data is appropriately sourced and licensed. If you could add a statement to the top of the readme that explains how the data was gathered and how permissions were obtained to distribute it, that would be extremely helpful.

Basically, our team will not be able to do anything with this data unless I can convince our legal reviewers that Open-Speech-EkStep has the right to distribute this data under that license. Thank you.

jeb-orcl avatar May 06 '22 17:05 jeb-orcl

Try https://gitter.im/Vakyansh/community?utm_source=share-link&utm_medium=link&utm_campaign=share-link# if you haven't received a response yet.

Anirudh257 avatar May 23 '22 09:05 Anirudh257