NeMo-text-processing
NeMo-text-processing copied to clipboard
zh TN is very slow and bad accuracy
one simple zh-CN sentence costs 1.32 sec and the result is not right.
>python normalize.py --text="123" --language=en
INFO:NeMo-text-processing:one hundred and twenty three
WARNING:NeMo-text-processing:Execution time: 0.02 sec
>python normalize.py --text="我出生于1998年7月22日" --language=zh
INFO:NeMo-text-processing:我出生于1998年7月22日
WARNING:NeMo-text-processing:Execution time: 1.32 sec
>python normalize.py --text="I'm born in 22/3/1990" --language=en
INFO:NeMo-text-processing:I'm born in the twenty second of march nineteen ninety
WARNING:NeMo-text-processing:Execution time: 0.02 sec
@BuyuanCui could you please take a look?
This seems to be related to the existing TN bug. It was not able to process a whole sentence. It will be fixed with the PR that I'm working.
@lifeiteng a few options to speed up:
- use
--cache_dir - use normalize_list() https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize.py#L75
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
这似乎与现有的 TN 错误有关。它无法处理整个句子。它将通过我正在工作的 PR 修复。
This seems to be related to the existing TN bug. It was not able to process a whole sentence. It will be fixed with the PR that I'm working.
Whether the relevant problem has been solved? There are still problems in version 0.3.0
@BuyuanCui https://github.com/NVIDIA/NeMo-text-processing/pull/112
I've found that the TN FST is slow regardless of language (English too). It is not very practical with large data even using multiprocessing (normalize_list()). Any other ways to speed it up?
@riqiang-dp we recommend Sparrowhawk for deployment https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/text_normalization/wfst/wfst_text_processing_deployment.html