NeMo-text-processing
NeMo-text-processing copied to clipboard

Published 20 hours ago •

Reame
Issues

zh TN is very slow and bad accuracy

Open lifeiteng opened this issue 2 years ago • 10 comments

one simple zh-CN sentence costs 1.32 sec and the result is not right.

>python normalize.py --text="123" --language=en
INFO:NeMo-text-processing:one hundred and twenty three
WARNING:NeMo-text-processing:Execution time: 0.02 sec

>python normalize.py --text="我出生于1998年7月22日" --language=zh
INFO:NeMo-text-processing:我出生于1998年7月22日
WARNING:NeMo-text-processing:Execution time: 1.32 sec

>python normalize.py --text="I'm born in 22/3/1990" --language=en
INFO:NeMo-text-processing:I'm born in the twenty second of march nineteen ninety
WARNING:NeMo-text-processing:Execution time: 0.02 sec

Oct 20 '23 06:10 lifeiteng

@BuyuanCui could you please take a look?

Oct 20 '23 17:10 ekmb

This seems to be related to the existing TN bug. It was not able to process a whole sentence. It will be fixed with the PR that I'm working.

Oct 20 '23 17:10 BuyuanCui

@lifeiteng a few options to speed up:

use --cache_dir
use normalize_list() https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize.py#L75

Oct 20 '23 17:10 ekmb

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Nov 20 '23 01:11 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Dec 21 '23 01:12 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jan 28 '24 01:01 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Mar 01 '24 01:03 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Apr 03 '24 01:04 github-actions[bot]

这似乎与现有的 TN 错误有关。它无法处理整个句子。它将通过我正在工作的 PR 修复。

This seems to be related to the existing TN bug. It was not able to process a whole sentence. It will be fixed with the PR that I'm working.

Whether the relevant problem has been solved? There are still problems in version 0.3.0

Apr 28 '24 16:04 lsrami

@BuyuanCui https://github.com/NVIDIA/NeMo-text-processing/pull/112

Apr 30 '24 18:04 ekmb

I've found that the TN FST is slow regardless of language (English too). It is not very practical with large data even using multiprocessing (normalize_list()). Any other ways to speed it up?

May 06 '24 22:05 riqiang-dp

@riqiang-dp we recommend Sparrowhawk for deployment https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/text_normalization/wfst/wfst_text_processing_deployment.html

May 07 '24 01:05 ekmb