NeMo
NeMo copied to clipboard
[Chinese text normalization]PR for chinese TN part in text_normalization
What does this PR do ?
new PR for previous chinese TN
Collection: [NeMo/norm_text_processing/text_normalization] I think have covered the problems mentioned previously.And I make a new pr to make it clear.
This pull request introduces 30 alerts when merging 8d122572a14c982bc24c1ff00db7287c04c715fe into 543118c3a0eb1f0a824f77659f92e36d3318fc75 - view on LGTM.com
new alerts:
- 30 for Unused import
@BuyuanCui @ekmb This PR continues https://github.com/NVIDIA/NeMo/issues/4543 & https://github.com/NVIDIA/NeMo/pull/4638 , ready for another round of reviews.
This pull request introduces 30 alerts when merging 93e4ecbb8b87525d75d7de7e2380759b05cd18db into 987674e29ea90f9a2f663bf95d74bd947d76bbc0 - view on LGTM.com
new alerts:
- 30 for Unused import
This pull request introduces 30 alerts when merging ce49b3bd20ccd4be6f30d6dcedc62140705a757e into 987674e29ea90f9a2f663bf95d74bd947d76bbc0 - view on LGTM.com
new alerts:
- 30 for Unused import
This pull request introduces 30 alerts when merging 8666d50733a990cccf1461272dd7865171667d9b into f921ebe0436e55f7547b183ca83a623f6678422d - view on LGTM.com
new alerts:
- 30 for Unused import
@mzxcpp please give clear signals here once you have addressed all issues from last review, so that we know when to move forward, and everyone won't get distracted by intermediate development activities.
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
This pull request introduces 29 alerts when merging ffbd160b7f14108818b0741ea2cf5f7389fbc88a into f8ca550967a83473aa2c20267690ac59c4fb640f - view on LGTM.com
new alerts:
- 29 for Unused import
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
also @mzxcpp could you re-request review you are done addressing the feedback? thanks!
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
Yes,the TN output is deterministic and I think it will work sucessfully in this case.However,when i test this,there is an error as "ModuleNotFoundError: No module named 'nemo_text_processing'".Should I install an another version of nemo in my env instead of running the sparrowhawk test in local nemo files?
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
Yes,the TN output is deterministic and I think it will work sucessfully in this case.However,when i test this,there is an error as "ModuleNotFoundError: No module named 'nemo_text_processing'".Should I install an another version of nemo in my env instead of running the sparrowhawk test in local nemo files?
you need to install nemo and nemo_text_processing is installed using https://github.com/NVIDIA/NeMo/blob/main/nemo_text_processing/install_pynini.sh
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
Yes,the TN output is deterministic and I think it will work sucessfully in this case.However,when i test this,there is an error as "ModuleNotFoundError: No module named 'nemo_text_processing'".Should I install an another version of nemo in my env instead of running the sparrowhawk test in local nemo files?
you need to install nemo and nemo_text_processing is installed using https://github.com/NVIDIA/NeMo/blob/main/nemo_text_processing/install_pynini.sh
Thank you!But what can I do if a tagger is not included in sparrowawk?Should I remove them all or keep them without test_sparrowhawk? I find that if a tagger's form is not included in the semiotic_classes.proto,it will not work successfully. The other part just work successfully like time or currency in this TN.
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
Yes,the TN output is deterministic and I think it will work sucessfully in this case.However,when i test this,there is an error as "ModuleNotFoundError: No module named 'nemo_text_processing'".Should I install an another version of nemo in my env instead of running the sparrowhawk test in local nemo files?
you need to install nemo and nemo_text_processing is installed using https://github.com/NVIDIA/NeMo/blob/main/nemo_text_processing/install_pynini.sh
Thank you!But what can I do if a tagger is not included in sparrowawk?Should I remove them all or keep them without test_sparrowhawk? I find that if a tagger's form is not included in the semiotic_classes.proto,it will not work successfully. The other part just work successfully like time or currency in this TN.
which classes' tagger is not included? optimally we want to have a semiotic_classes.proto or expand it so it supports all classes. you can add the pytest still, just don't include them in test_sparrowhawk until we figured out how to expand the .proto file
This pull request introduces 31 alerts when merging 01795ec4f5067ee6038b1033473f55d7255adee8 into 8845addc6562f2df3740c95ee496a26849b526d0 - view on LGTM.com
new alerts:
- 31 for Unused import
Thank you for your detailed review again!I have fixed some naming problems and now make sure the pytest will work.However,with only TN part I have designed,it may be difficult for me to deal with work with sparrowhawk related.
Is your TN output non-deterministic? if not, sparrowhawk should run Out of the box
Yes,the TN output is deterministic and I think it will work sucessfully in this case.However,when i test this,there is an error as "ModuleNotFoundError: No module named 'nemo_text_processing'".Should I install an another version of nemo in my env instead of running the sparrowhawk test in local nemo files?
you need to install nemo and nemo_text_processing is installed using https://github.com/NVIDIA/NeMo/blob/main/nemo_text_processing/install_pynini.sh
Thank you!But what can I do if a tagger is not included in sparrowawk?Should I remove them all or keep them without test_sparrowhawk? I find that if a tagger's form is not included in the semiotic_classes.proto,it will not work successfully. The other part just work successfully like time or currency in this TN.
which classes' tagger is not included? optimally we want to have a semiotic_classes.proto or expand it so it supports all classes. you can add the pytest still, just don't include them in test_sparrowhawk until we figured out how to expand the .proto file
Well,I checked my code again and reorganize it. I think math in tagger in which I designed detailed graph for cases like "25:26" into"二十五比二十六",may be we call this "score" All in all,I think these are just small cases and I have work succeessfully with "pytest" and "sparrowhawk" test.I upload the latest version. Although there may more detailed cases I don't cover,I think it now can handle with most common cases in Chinese and I hope this can be a base of Chinese TN so that we can add more details to get a perfect TN tool for Chinese in further works.
This pull request introduces 31 alerts when merging d9d6281ab696a097c3c425e20e9aa6ff65279527 into 8845addc6562f2df3740c95ee496a26849b526d0 - view on LGTM.com
new alerts:
- 31 for Unused import
This pull request introduces 31 alerts when merging 9ec40908a3be46ca077b88eb6b5182a23f9757c2 into 8845addc6562f2df3740c95ee496a26849b526d0 - view on LGTM.com
new alerts:
- 31 for Unused import
The local setup fix nothing for current code but I can't pass CI test this time. Is it because I changed something in tools/text_processing_deployment?
他说:“我们已经吃过了!”。
=> 他说比“我们已经吃过了!”。
这儿有只鸟儿
=> normalize.py", line 363, in _permute raise ValueError()
他说:“我们已经吃过了!”。
=>他说比“我们已经吃过了!”。
这儿有只鸟儿
=>normalize.py", line 363, in _permute raise ValueError()
Thank you!I have fixed it.
Could you please try to fix the alerts as described here: https://lgtm.com/projects/g/NVIDIA/NeMo/rev/pr-f0d82e6d82662b60dc1c597bc403b4ac6dc1615a? Thanks.
Could you please try to fix the alerts as described here: https://lgtm.com/projects/g/NVIDIA/NeMo/rev/pr-f0d82e6d82662b60dc1c597bc403b4ac6dc1615a? Thanks.
Ok,I have deleted the useless lines in the lastest commit
@mzxcpp
- could you please remove the unrelated files from your PR:
- Could you fix the LGTM errors? like removing unused imports etc?
After that, i think we can merge it for now. Verified that pytests and the subset of SH tests that are added are passing.