speech-datasets icon indicating copy to clipboard operation
speech-datasets copied to clipboard

Off-by-one labeling in 4341191.nlp in Earnings21

Open ryanwesterman-zoom opened this issue 2 years ago • 0 comments

Starting at line 10876 in 4341191.nlp the labels for every field except token seem to be shifted down by one.

For example, the token uh- here is tagged as 1649 which corresponds to PERSON, and is also punctuated with .... Both of these make more sense on the above token Dean. This continues from line 10876 to the end of the file.

5D9ADD7F-4C61-4934-B22C-BA1B027F5928

ryanwesterman-zoom avatar Aug 22 '23 16:08 ryanwesterman-zoom