speech-datasets icon indicating copy to clipboard operation
speech-datasets copied to clipboard

Various speech datasets made available to the public

Results 8 speech-datasets issues
Sort by recently updated
recently updated
newest added

Hello, when I do the [following](https://github.com/revdotcom/speech-datasets#steps-to-download-from-lfs): ``` cd earnings22 git lfs pull ``` There's such errors: ``` batch response: This repository is over its data quota. Account responsible for LFS...

Starting at line `10876` in `4341191.nlp` the labels for every field except `token` seem to be shifted down by one. For example, the token `uh-` here is tagged as `1649`...

`earnings21/earnings21-file-metadata.csv` seems to disagree with the output of `lhotse prepare earnings21` from https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/earnings21.py, namely the duration and sample count

Hi there, Are there any normalization files for the earnings-22 dataset? If yes, could you please share it with me? Thanks in advance.

https://github.com/revdotcom/speech-datasets/blob/1852d8e8f79745415e17ed294f1de0f884513465/earnings21/transcripts/nlp_references/4363614.nlp#L2-L44 It seems the transcript there has some issue, as quoted. E.g. `` for company's name, `` for person's name. This can be checked against [here](https://seekingalpha.com/article/4363614-banco-santander-mexico-s-bsmx-ceo-hector-grisi-on-q2-2020-results-earnings-call-transcript)

Hi ! Would you consider making the audio and transcriptions for the podcast dataset mentioned [in your blogpost](https://www.rev.com/blog/the-podcast-challenge-testing-rev-ais-speech-recognition-accuracy) available in this repository ? Thanks !

Earnings21: - Fix file 4341191 labels that are shifted off by one - Resolves #35 Earnings22: - Fixed casing label of numerics from `UC`/`CA`/`LC` to `N/A` - Fixed preparation error...

License in Earnings21 says: > The transcripts and associated text files that are used for alignment in this directory are licensed under a [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/) license. What...