multilingual_kws
multilingual_kws copied to clipboard
Filter out NaNs from Common Voice tsvs, distinguish between intentional "nan" in language vocabulary
in German, 'null' (zero) is being converted to NaN
by pandas when it is the only word present in the transcript (due to single-word-target-segments data)
One option is to use filter_na=False
when reading Common Voice TSVs
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
however, we should also first check for truly missing values in the sentence transcription column