speech-datasets issues

Cannot download earnings22

Hello, when I do the [following](https://github.com/revdotcom/speech-datasets#steps-to-download-from-lfs): ``` cd earnings22 git lfs pull ``` There's such errors: ``` batch response: This repository is over its data quota. Account responsible for LFS...

huangruizhe

Off-by-one labeling in 4341191.nlp in Earnings21

Starting at line `10876` in `4341191.nlp` the labels for every field except `token` seem to be shifted down by one. For example, the token `uh-` here is tagged as `1649`...

ryanwesterman-zoom

Metadata CSV contains potentially wrong data

`earnings21/earnings21-file-metadata.csv` seems to disagree with the output of `lhotse prepare earnings21` from https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/earnings21.py, namely the duration and sample count

qmac

Normalization files for earnings22 dataset

6

Hi there, Are there any normalization files for the earnings-22 dataset? If yes, could you please share it with me? Thanks in advance.

elchilinga

Transcript issues for 4363614 in earnings-21

https://github.com/revdotcom/speech-datasets/blob/1852d8e8f79745415e17ed294f1de0f884513465/earnings21/transcripts/nlp_references/4363614.nlp#L2-L44 It seems the transcript there has some issue, as quoted. E.g. `` for company's name, `` for person's name. This can be checked against [here](https://seekingalpha.com/article/4363614-banco-santander-mexico-s-bsmx-ceo-hector-grisi-on-q2-2020-results-earnings-call-transcript)

huangruizhe

Podcast challenge dataset

1

Hi ! Would you consider making the audio and transcriptions for the podcast dataset mentioned [in your blogpost](https://www.rev.com/blog/the-podcast-challenge-testing-rev-ais-speech-recognition-accuracy) available in this repository ? Thanks !

tdeboissiere

Miscellaneous patches to Earnings

Earnings21: - Fix file 4341191 labels that are shifted off by one - Resolves #35 Earnings22: - Fixed casing label of numerics from `UC`/`CA`/`LC` to `N/A` - Fixed preparation error...

qmac

License for Earnings21 audio

License in Earnings21 says: > The transcripts and associated text files that are used for alignment in this directory are licensed under a [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/) license. What...

hbredin

speech-datasets
speech-datasets copied to clipboard

Metadata

Cannot download earnings22

Off-by-one labeling in 4341191.nlp in Earnings21

Metadata CSV contains potentially wrong data

Normalization files for earnings22 dataset

Transcript issues for 4363614 in earnings-21

Podcast challenge dataset

Miscellaneous patches to Earnings

License for Earnings21 audio

← Metadata

Owner

Metadata

speech-datasets speech-datasets copied to clipboard

Metadata

← Metadata

Owner

Metadata

speech-datasets
speech-datasets copied to clipboard