speech-datasets
speech-datasets copied to clipboard
Miscellaneous patches to Earnings
Earnings21:
- Fix file 4341191 labels that are shifted off by one
- Resolves #35
Earnings22:
- Fixed casing label of numerics from
UC/CA/LCtoN/A - Fixed preparation error that led to some files (~80) having atmospherics such as
<crosstalk>or<inaudible>appear without<>. Now they should be easier to filter out/account for in evaluation. As a side effect, some words that were marked as guesses by the transcriber now appear as<unk>. For example line 1275 in file 4329526Romero --> <unk>. We will try to patch this in the future.