NeMo-text-processing icon indicating copy to clipboard operation
NeMo-text-processing copied to clipboard

es TN bug regarding the word 'o'

Open seunghunJi opened this issue 6 months ago • 0 comments

Describe the bug Hi, I recently started to use the Spanish text normalizer, and I found a bug. I don't expect the normalizer to convert the conjunction 'o' into another word 'oeste', but it seems to happen more often than not. I'm not proficient in the Spanish language but still I don't think it is a correct way to normalize a Spanish sentence. Is it an expected behavior, or is it a known bug? I appreciate if you guys take a look. The three sentences are just random sentences I got from a Spanish dictionary. The version of nemo_text_normalizer I am using is 1.0.2.

Steps/Code to reproduce bug Python code:

from nemo_text_processing.text_normalization import Normalizer

text_normalizer = Normalizer(input_case="lower_cased", lang="es", post_process=True)

text = ["Norte o Sur?", "O te callas, o me marcho.", "Date prisa, o perderás el tren."]

for t in text:
    print(t, "->", text_normalizer.normalize(t, punct_post_process=True, punct_pre_process=True))

Output:

Norte o Sur? -> Norte oeste Sur?
O te callas, o me marcho. -> O te callas, oeste me marcho.
Date prisa, o perderás el tren. -> Date prisa, oeste perderás el tren.

Expected behavior No normalization on any of the 'o's in the above sentences.

Thanks!

seunghunJi avatar Aug 19 '24 09:08 seunghunJi