NeMo-text-processing
NeMo-text-processing copied to clipboard
es TN bug regarding the word 'o'
Describe the bug Hi, I recently started to use the Spanish text normalizer, and I found a bug. I don't expect the normalizer to convert the conjunction 'o' into another word 'oeste', but it seems to happen more often than not. I'm not proficient in the Spanish language but still I don't think it is a correct way to normalize a Spanish sentence. Is it an expected behavior, or is it a known bug? I appreciate if you guys take a look. The three sentences are just random sentences I got from a Spanish dictionary. The version of nemo_text_normalizer I am using is 1.0.2.
Steps/Code to reproduce bug Python code:
from nemo_text_processing.text_normalization import Normalizer
text_normalizer = Normalizer(input_case="lower_cased", lang="es", post_process=True)
text = ["Norte o Sur?", "O te callas, o me marcho.", "Date prisa, o perderás el tren."]
for t in text:
print(t, "->", text_normalizer.normalize(t, punct_post_process=True, punct_pre_process=True))
Output:
Norte o Sur? -> Norte oeste Sur?
O te callas, o me marcho. -> O te callas, oeste me marcho.
Date prisa, o perderás el tren. -> Date prisa, oeste perderás el tren.
Expected behavior No normalization on any of the 'o's in the above sentences.
Thanks!