NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Multi token bleu

Open tbartley94 opened this issue 8 months ago • 7 comments

[!IMPORTANT]
The Update branch button must only be pressed in very rare occassions. An outdated branch is never blocking the merge of a PR. Please reach out to the automation team before pressing that button.

What does this PR do ?

  • Adds functionality to asr metrics so multiple bleu tokenizers can be used on evaluation. This allows for multilingual cases where a single SacreBLEU tokenizer isn't able to properly tokenize. (e.g. Roman scripts + CJK languages).
  • Fixed bug in BLEU metric that improperly nested references, leading to only partial bleu calculations..
  • Changed references in aed_models that were not properly referencing test_dataloader attributes.
  • Added training_batch logging for bleu and wer metrics gathering during training so behavior is consistent with other asr models.

Collection: [Note which collection this PR will affect]

  • asr

Changelog

  • Changed bleu.py to now allow passing dicts of multiple bleu tokenizers
  • changed bleu.update to switch tokenizers 'on the fly' for target evaluation
  • Changed aed_model logging to pass strings of target_lang to bleu.update to allow per sample evaluation
  • Added util function to lhotse.cutset to flatten mixed set to allow easier reference to custom attributes
  • Replaced references to validation attributes in test dataloder to appropriate keys.
  • changed prefix of wer on validation pass to now allow referencing test step
  • added intratraining bleu and wer update for training batches to catch training level bleu updates.
  • moved metric logging in validation to standalone function to reduce boiletplate between validation and training steps.

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • [y ] Make sure you read and followed Contributor guidelines
  • [ y] Did you write any new necessary tests?
  • [ y] Did you add or update any necessary documentation?
  • [ n] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [ y] New Feature
  • [ y] Bugfix
  • [ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

tbartley94 avatar May 10 '25 05:05 tbartley94

@pzelasko only adding you as a reviewer in case you have a better way of quickly getting the custom keyword out. (can't use supervision.lang since that's for the tokenizer)

tbartley94 avatar May 10 '25 05:05 tbartley94

tests are missing.

yeah, once the main code is deemed acceptable I can throw in unit tests. (saves rewriting them in case y'all have a fundamental issue with structure)

tbartley94 avatar Jun 03 '25 21:06 tbartley94

Please add this info to NeMo docs

nithinraok avatar Jun 11 '25 23:06 nithinraok

Please add this info to NeMo docs

done

tbartley94 avatar Jun 12 '25 05:06 tbartley94

@nithinraok Looks like upstream people didn't manage dependencies well and it's causing unrelated failures in CICD tests. Can we just skip it if I screenshot pytests running on my end?

tbartley94 avatar Jun 12 '25 06:06 tbartley94

@nithinraok Looks like upstream people didn't manage dependencies well and it's causing unrelated failures in CICD tests. Can we just skip it if I screenshot pytests running on my end?

@chtruong814 can you please have a look

nithinraok avatar Jun 12 '25 13:06 nithinraok

@nithinraok Looks like upstream people didn't manage dependencies well and it's causing unrelated failures in CICD tests. Can we just skip it if I screenshot pytests running on my end?

@chtruong814 can you please have a look

@nithinraok @tbartley94 Is there a certain CI pipeline you're referring to? The last failure was due to the pipeline getting canceled from a more recent push.

https://github.com/NVIDIA/NeMo/actions/runs/15602808643/job/43990771775

chtruong814 avatar Jun 12 '25 18:06 chtruong814

CI passed here. Will go ahead and merge https://github.com/NVIDIA/NeMo/actions/runs/15718713187

chtruong814 avatar Jun 18 '25 20:06 chtruong814