NeMo-text-processing
NeMo-text-processing copied to clipboard
Fixes issue 228
What does this PR do ?
The fix addresses one of the issues reported in #228, in particular:
text: Hier zoome ich auf die Läsion. Wir befinden uns also auf der 2D-Mammographie.
norm_text:Hier zoome ich auf die Läsion. Wir befinden uns also auf der 2D-Mammographie.
expected output: Hier zoome ich auf die Läsion. Wir befinden uns also auf der Zwei-D-Mammographie. (not sure)
The updated system correctly transduces these common hyphenated nominal compounds and can be easily expanded to include others.
Before your PR is "Ready for review"
Pre checks:
- [x] Have you signed your commits? Use
git commit -s
to sign. - [x] Do all unittests finish successfully before sending PR?
-
pytest
or (if your machine does not have GPU)pytest --cpu
from the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')
). - Sparrowhawk tests
bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
-
- [x] If you are adding a new feature: Have you added test cases for both
pytest
and Sparrowhawk here. - [x] Have you added
__init__.py
for every folder and subfolder, includingdata
folder which has .TSV files? - [x] Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
- [x] Have you added the correct license header
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
to all newly added Python files? - [x] If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be
Copyright 2015 and onwards Google, Inc.
. See an example here. - [x] Remove import guards (
try import: ... except: ...
) if not already done. - [x] If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
- [x] Have you added your language support to tools/text_processing_deployment/pynini_export.py.
PR Type:
- [ ] New Feature
- [x] Bugfix
- [ ] Documentation
- [ ] Test
If you haven't finished some of the above items you can still open "Draft" PR.