NeMo-text-processing issues

Some bugs in English, German, Spanish, Italian normalizers

1

Hi! I found a bug in English normalization. The following code is applied: ```python normalizer = Normalizer( input_case="cased", lang="en", deterministic=True, ) norm_text = normalizer.normalize(text, punct_post_process=True) ``` text=`Here is mail.nasa.gov.` norm_text=`Here...

Oktai15

bug

remove subsitiution of a state code to state name

2

…t a otional space anf than acronym of a US state in the replacement the state acronym is replaced with the state full name, e.g. d, NC -> , North...

pirchi1

Stale

Jp itn

10

# What does this PR do ? Merge to main for Japanese support on cardinal, ordinal, time, date, fraction, decimal support Add a one line overview of what this PR...

BuyuanCui

zh TN is very slow and bad accuracy

10

one simple zh-CN sentence costs `1.32 sec` and the result is not right. ``` >python normalize.py --text="123" --language=en INFO:NeMo-text-processing:one hundred and twenty three WARNING:NeMo-text-processing:Execution time: 0.02 sec >python normalize.py --text="我出生于1998年7月22日"...

lifeiteng

bug

es and es_en changes for unified models

6

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. # Before your PR is "Ready for review" **Pre checks**: -...

mgrafu

Stale

Jp itn 20240221

2

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. PR for Japanese itn instead of #101 # Before your PR...

BuyuanCui

bug in graph_utils.py of zh ITN and decimal tagger of ar TN

3

In the `./nemo_text_processing/inverse_text_normalization/zh/graph_utils.py` line 79, `load_labels()` method is called but it is not imported, So it raises error. It could be simply resolved by adding the following method in `./nemo_text_processing/inverse_text_normalization/zh/utils.py`:...

hannan72

bug

malloc error on initialization of inverse normalizer

1

In the initialization of inverse normalizer for English language, sometimes the code crashed with the following error at the initialization: ```malloc(): unaligned tcache chunk detected``` I traced the code and...

hannan72

bug

[zh] WARNING:NeMo-text-processing:Failed text: 免除GOOGLE在一桩诽谤官司中的法律责任。Key: integer_part Value: None

8

Received warning message when normalizing text. Could you pls provide what the message indicates? **Reproduciple code**: ```python from nemo_text_processing.text_normalization.normalize import Normalizer text_normalizer = Normalizer(lang="zh", input_case="cased", overwrite_cache=True, cache_dir=str("cache_dir")) text_normalizer_call_kwargs = {"punct_pre_process":...

XuesongYang

bug

Stale

German TN fixes

# What does this PR do ? This PR implements DE TN fixes for the following issues: - Adds support for normalizing social media tags (e.g. `@zoobereq` and `@zoobereq.net`) -...

zoobereq

NeMo-text-processing
NeMo-text-processing copied to clipboard

Metadata

Some bugs in English, German, Spanish, Italian normalizers

remove subsitiution of a state code to state name

Jp itn

zh TN is very slow and bad accuracy

es and es_en changes for unified models

Jp itn 20240221

bug in graph_utils.py of zh ITN and decimal tagger of ar TN

malloc error on initialization of inverse normalizer

[zh] WARNING:NeMo-text-processing:Failed text: 免除GOOGLE在一桩诽谤官司中的法律责任。Key: integer_part Value: None

German TN fixes

← Metadata

Owner

Metadata

NeMo-text-processing NeMo-text-processing copied to clipboard

Metadata

← Metadata

Owner

Metadata

NeMo-text-processing
NeMo-text-processing copied to clipboard