speech-datasets icon indicating copy to clipboard operation
speech-datasets copied to clipboard

Miscellaneous patches to Earnings

Open qmac opened this issue 1 year ago • 0 comments

Earnings21:

  • Fix file 4341191 labels that are shifted off by one
  • Resolves #35

Earnings22:

  • Fixed casing label of numerics from UC/CA/LC to N/A
  • Fixed preparation error that led to some files (~80) having atmospherics such as <crosstalk> or <inaudible> appear without <>. Now they should be easier to filter out/account for in evaluation. As a side effect, some words that were marked as guesses by the transcriber now appear as <unk>. For example line 1275 in file 4329526 Romero --> <unk>. We will try to patch this in the future.

qmac avatar Aug 02 '24 21:08 qmac