sherpa-onnx
sherpa-onnx copied to clipboard
Add LODR support to online and offline recognizers
This PR adds LODR support from Icefall to offline and online recognizers for both LM shallow fusion and LM rescore. (see https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html)
Usage example:
# offline LM rescore
sherpa-onnx-offline --tokens=tokens.txt \
--encoder=encoder.onnx \
--decoder=decoder.onnx \
--joiner=joiner.onnx \
--decoding-method=modified_beam_search \
--lm=lm.onnx \
--lodr-fst=2gram.fst \
--lodr-scale=-0.5 \
test.wav
# online LM rescore
sherpa-onnx --tokens=tokens.txt \
--encoder=encoder.onnx \
--decoder=decoder.onnx \
--joiner=joiner.onnx \
--decoding-method=modified_beam_search \
--lm=lm.onnx \
--lodr-fst=2gram.fst \
--lodr-scale=-0.5 \
--lm-shallow-fusion=false \
test.wav
# online LM shallow fusion
sherpa-onnx --tokens=tokens.txt \
--encoder=encoder.onnx \
--decoder=decoder.onnx \
--joiner=joiner.onnx \
--decoding-method=modified_beam_search \
--lm=lm.onnx \
--lodr-fst=2gram.fst \
--lodr-scale=-0.5 \
--lodr-backoff-id=500 \
--lm-shallow-fusion=true \
test.wav
Where,
- 2gram.fst is the LODR n-gram model in the binary FST format, e.g. created by Icefall using arpa2fst and then compiled to binary (fstcompile)
- "lodr-backoff-id" is the ID of backoff symbol in LODR FST (typically 0 or len(vocabulary))
Can you show how it improves the decoding result and also how it affects the RTF?
You can check the LODR paper
In our experiments with private data we saw relative improvements of 3-7%.
Some performance numbers as reported by shepra-onnx (non-optimized debug build on CPU)
LM rescore, no LODR: Number of threads: 2, Elapsed seconds: 2.6e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.6e+03/5e+03 = 0.52 LODR: Number of threads: 2, Elapsed seconds: 2.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.8e+03/5e+03 = 0.56
LM shallow fusion, no LODR: Number of threads: 2, Elapsed seconds: 6.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.8e+03/5e+03 = 1.4 LODR: Number of threads: 2, Elapsed seconds: 6.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.8e+03/5e+03 = 1.4
Some performance numbers as reported by shepra-onnx (non-optimized debug build on CPU)
Can you test with a release build?
Can you test with a release build? on the same ~1.5h audio.
rescore: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.45
rescore+LODR: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.47
SF: Number of threads: 2, Elapsed seconds: 6.2e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.2e+03/5e+03 = 1.2
SF+LODR: Number of threads: 2, Elapsed seconds: 6.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.3e+03/5e+03 = 1.3
@csukuangfj Just wanted to kindly check in to see if there's anything else you'd like me to update on this PR.
btw, appreciate your time and all the work you do on the project. Is there any plan to have more maintainers/reviewers?
Can you test with a release build? on the same ~1.5h audio.
rescore: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.45
rescore+LODR: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.47
SF: Number of threads: 2, Elapsed seconds: 6.2e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.2e+03/5e+03 = 1.2
SF+LODR: Number of threads: 2, Elapsed seconds: 6.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.3e+03/5e+03 = 1.3
Thank you for sharing the test results.
Is there any plan to have more maintainers/reviewers?
Yes, sherpa-onnx is an open-source project. Contributions of any form, e.g., pull-requests, code reviews, are always welcome.
Hi again!
Requested changes have been integrated into PR.
@csukuangfj
@csukuangfj backoff_id is now -1 by default and infered from the FST itself.
Can you add a CI test for it? It would be great if a python example test and a example using pre-built binary are available so that users can learn how to use the new feature through examples.
Can you add a CI test for it? It would be great if a python example test and a example using pre-built binary are available so that users can learn how to use the new feature through examples.
Yes, I think I can add something like this. I will need to download models and LODR FST during the CI test, I can probably use some public models, but what about FST ?
Also, what audio should I use in the test and where is best place to host it?
Can you upload the files to huggingface and download them from CI?
By the way, if you don't want to make your model and fst public, can you use the test model and fst files from icefall?
@csukuangfj I added some CI tests using Zipformer2 EN models, CLI and python
@csukuangfj I added some CI tests using Zipformer2 EN models, CLI and python
Thanks! Will review it this week.
@csukuangfj is there anything you'd like me to update on this PR?
Walkthrough
This change introduces LODR (Lattice On-Demand Rescoring) support across both offline and online speech recognition pipelines. It adds new configuration options, command-line arguments, and implementation for LODR FST-based rescoring in C++ and Python APIs. Test scripts and example usage are updated to validate and demonstrate the new functionality, and supporting classes for FST-based rescoring are implemented.
Changes
| Files/Groups | Change Summary |
|---|---|
.github/scripts/test-*.sh |
Updated test scripts to download/prepare LODR FST and RNN-LM models, and run new tests with LODR and LM integration. |
python-api-examples/offline-decode-files.py, python-api-examples/online-decode-files.py |
Added command-line arguments for LODR FST and LODR scale; passed these to recognizer constructors; updated usage docs. |
sherpa-onnx/csrc/lodr-fst.h, sherpa-onnx/csrc/lodr-fst.cc |
Introduced new classes for LODR FST and state cost management, enabling FST-based rescoring. |
sherpa-onnx/csrc/CMakeLists.txt |
Added lodr-fst.cc to the build. |
sherpa-onnx/csrc/hypothesis.h |
Added a lodr_state member to the Hypothesis struct for LODR state tracking. |
sherpa-onnx/csrc/offline-lm-config.*, sherpa-onnx/csrc/online-lm-config.* |
Added LODR FST path, scale, and backoff ID to LM config structs with registration and validation. |
sherpa-onnx/csrc/offline-lm.h, sherpa-onnx/csrc/offline-lm.cc |
Integrated LODR FST scoring into offline LM scoring logic; added config-based LODR FST instantiation. |
sherpa-onnx/csrc/offline-rnn-lm.cc |
Updated constructors to call base class with full config (including LODR options). |
sherpa-onnx/csrc/online-rnn-lm.cc |
Integrated LODR FST scoring into online RNN-LM scoring logic, supporting both shallow fusion and rescoring. |
sherpa-onnx/python/csrc/offline-lm-config.cc, sherpa-onnx/python/csrc/online-lm-config.cc |
Exposed new LODR FST and scale (and backoff ID for online) to Python bindings and constructors. |
sherpa-onnx/python/sherpa_onnx/offline_recognizer.py, sherpa-onnx/python/sherpa_onnx/online_recognizer.py |
Added LODR FST and scale parameters to recognizer constructors and passed to config objects. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant PythonScript
participant Recognizer
participant LM (RNN/NN)
participant LODR FST
User->>PythonScript: Run decode with --lm, --lodr-fst, --lodr-scale
PythonScript->>Recognizer: Construct with LM and LODR config
Recognizer->>LM (RNN/NN): Score hypothesis
Recognizer->>LODR FST: Rescore hypothesis with FST and scale
LODR FST-->>Recognizer: Return LODR score
LM (RNN/NN)-->>Recognizer: Return LM score
Recognizer-->>PythonScript: Final rescored hypothesis
PythonScript-->>User: Output results
Poem
🐇
I hopped through FSTs and lattices wide,
With LODR and language models by my side.
Now rescoring is clever, robust, and neat—
Our recognition’s accuracy hard to beat!
From scripts to configs, new options bloom,
Lattice magic brings results that zoom!
Hooray for the code—let’s celebrate and eat!
📜 Recent review details
Configuration used: CodeRabbit UI Review profile: CHILL Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 5dc574a6b8bb28f4aa750072056678dc37566181 and c761a7da9c420077fb3e5865cf1f9c8559f22361.
📒 Files selected for processing (1)
sherpa-onnx/csrc/lodr-fst.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- sherpa-onnx/csrc/lodr-fst.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
- GitHub Check: aarch64 shared GPU ON 1.11.0
- GitHub Check: windows-2022 3.12
- GitHub Check: windows-2022 3.11
- GitHub Check: windows-2022 3.7
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.