sherpa-onnx Add LODR support to online and offline recognizers

This PR adds LODR support from Icefall to offline and online recognizers for both LM shallow fusion and LM rescore. (see https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html)

Usage example:

# offline LM rescore
sherpa-onnx-offline     --tokens=tokens.txt    \
                                     --encoder=encoder.onnx     \
                                     --decoder=decoder.onnx     \
                                     --joiner=joiner.onnx      \
                                     --decoding-method=modified_beam_search  \
                                     --lm=lm.onnx \
                                     --lodr-fst=2gram.fst \
                                     --lodr-scale=-0.5  \
                                     test.wav
# online LM rescore
sherpa-onnx                 --tokens=tokens.txt    \
                                     --encoder=encoder.onnx     \
                                     --decoder=decoder.onnx     \
                                     --joiner=joiner.onnx      \
                                     --decoding-method=modified_beam_search  \
                                     --lm=lm.onnx \
                                     --lodr-fst=2gram.fst \
                                     --lodr-scale=-0.5  \
                                     --lm-shallow-fusion=false \
                                     test.wav
                                     
# online LM shallow fusion
sherpa-onnx                 --tokens=tokens.txt    \
                                     --encoder=encoder.onnx     \
                                     --decoder=decoder.onnx     \
                                     --joiner=joiner.onnx      \
                                     --decoding-method=modified_beam_search  \
                                     --lm=lm.onnx \
                                     --lodr-fst=2gram.fst \
                                     --lodr-scale=-0.5  \
                                     --lodr-backoff-id=500 \
                                     --lm-shallow-fusion=true \
                                     test.wav

Where,

2gram.fst is the LODR n-gram model in the binary FST format, e.g. created by Icefall using arpa2fst and then compiled to binary (fstcompile)
"lodr-backoff-id" is the ID of backoff symbol in LODR FST (typically 0 or len(vocabulary))

Mar 19 '25 14:03 vsd-vector

Can you show how it improves the decoding result and also how it affects the RTF?

Mar 21 '25 04:03 csukuangfj

You can check the LODR paper

In our experiments with private data we saw relative improvements of 3-7%.

Some performance numbers as reported by shepra-onnx (non-optimized debug build on CPU)

LM rescore, no LODR: Number of threads: 2, Elapsed seconds: 2.6e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.6e+03/5e+03 = 0.52 LODR: Number of threads: 2, Elapsed seconds: 2.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.8e+03/5e+03 = 0.56

LM shallow fusion, no LODR: Number of threads: 2, Elapsed seconds: 6.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.8e+03/5e+03 = 1.4 LODR: Number of threads: 2, Elapsed seconds: 6.8e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.8e+03/5e+03 = 1.4

Mar 24 '25 08:03 vsd-vector

Some performance numbers as reported by shepra-onnx (non-optimized debug build on CPU)

Can you test with a release build?

Mar 24 '25 08:03 csukuangfj

Can you test with a release build? on the same ~1.5h audio.

rescore: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.45

rescore+LODR: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.47

SF: Number of threads: 2, Elapsed seconds: 6.2e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.2e+03/5e+03 = 1.2

SF+LODR: Number of threads: 2, Elapsed seconds: 6.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.3e+03/5e+03 = 1.3

Mar 24 '25 14:03 vsd-vector

@csukuangfj Just wanted to kindly check in to see if there's anything else you'd like me to update on this PR.

btw, appreciate your time and all the work you do on the project. Is there any plan to have more maintainers/reviewers?

Apr 07 '25 14:04 vsd-vector

Can you test with a release build? on the same ~1.5h audio.

rescore: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.45

rescore+LODR: Number of threads: 2, Elapsed seconds: 2.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 2.3e+03/5e+03 = 0.47

SF: Number of threads: 2, Elapsed seconds: 6.2e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.2e+03/5e+03 = 1.2

SF+LODR: Number of threads: 2, Elapsed seconds: 6.3e+03, Audio duration (s): 5e+03, Real time factor (RTF) = 6.3e+03/5e+03 = 1.3

Thank you for sharing the test results.

Is there any plan to have more maintainers/reviewers?

Yes, sherpa-onnx is an open-source project. Contributions of any form, e.g., pull-requests, code reviews, are always welcome.

Apr 08 '25 06:04 csukuangfj

Hi again!

Requested changes have been integrated into PR.

Apr 08 '25 13:04 vsd-vector

@csukuangfj

Apr 24 '25 09:04 vsd-vector

@csukuangfj backoff_id is now -1 by default and infered from the FST itself.

Apr 25 '25 10:04 vsd-vector

Can you add a CI test for it? It would be great if a python example test and a example using pre-built binary are available so that users can learn how to use the new feature through examples.

Apr 29 '25 13:04 csukuangfj

Can you add a CI test for it? It would be great if a python example test and a example using pre-built binary are available so that users can learn how to use the new feature through examples.

Yes, I think I can add something like this. I will need to download models and LODR FST during the CI test, I can probably use some public models, but what about FST ?

Also, what audio should I use in the test and where is best place to host it?

Apr 29 '25 13:04 vsd-vector

Can you upload the files to huggingface and download them from CI?

Apr 29 '25 14:04 csukuangfj

By the way, if you don't want to make your model and fst public, can you use the test model and fst files from icefall?

Apr 29 '25 14:04 csukuangfj

@csukuangfj I added some CI tests using Zipformer2 EN models, CLI and python

May 08 '25 13:05 vsd-vector

@csukuangfj I added some CI tests using Zipformer2 EN models, CLI and python

Thanks! Will review it this week.

May 15 '25 07:05 csukuangfj

@csukuangfj is there anything you'd like me to update on this PR?

Jul 08 '25 14:07 vsd-vector

Walkthrough

This change introduces LODR (Lattice On-Demand Rescoring) support across both offline and online speech recognition pipelines. It adds new configuration options, command-line arguments, and implementation for LODR FST-based rescoring in C++ and Python APIs. Test scripts and example usage are updated to validate and demonstrate the new functionality, and supporting classes for FST-based rescoring are implemented.

Changes

Files/Groups	Change Summary
`.github/scripts/test-*.sh`	Updated test scripts to download/prepare LODR FST and RNN-LM models, and run new tests with LODR and LM integration.
`python-api-examples/offline-decode-files.py`, `python-api-examples/online-decode-files.py`	Added command-line arguments for LODR FST and LODR scale; passed these to recognizer constructors; updated usage docs.
`sherpa-onnx/csrc/lodr-fst.h`, `sherpa-onnx/csrc/lodr-fst.cc`	Introduced new classes for LODR FST and state cost management, enabling FST-based rescoring.
`sherpa-onnx/csrc/CMakeLists.txt`	Added `lodr-fst.cc` to the build.
`sherpa-onnx/csrc/hypothesis.h`	Added a `lodr_state` member to the `Hypothesis` struct for LODR state tracking.
`sherpa-onnx/csrc/offline-lm-config.`, `sherpa-onnx/csrc/online-lm-config.`	Added LODR FST path, scale, and backoff ID to LM config structs with registration and validation.
`sherpa-onnx/csrc/offline-lm.h`, `sherpa-onnx/csrc/offline-lm.cc`	Integrated LODR FST scoring into offline LM scoring logic; added config-based LODR FST instantiation.
`sherpa-onnx/csrc/offline-rnn-lm.cc`	Updated constructors to call base class with full config (including LODR options).
`sherpa-onnx/csrc/online-rnn-lm.cc`	Integrated LODR FST scoring into online RNN-LM scoring logic, supporting both shallow fusion and rescoring.
`sherpa-onnx/python/csrc/offline-lm-config.cc`, `sherpa-onnx/python/csrc/online-lm-config.cc`	Exposed new LODR FST and scale (and backoff ID for online) to Python bindings and constructors.
`sherpa-onnx/python/sherpa_onnx/offline_recognizer.py`, `sherpa-onnx/python/sherpa_onnx/online_recognizer.py`	Added LODR FST and scale parameters to recognizer constructors and passed to config objects.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PythonScript
    participant Recognizer
    participant LM (RNN/NN)
    participant LODR FST

    User->>PythonScript: Run decode with --lm, --lodr-fst, --lodr-scale
    PythonScript->>Recognizer: Construct with LM and LODR config
    Recognizer->>LM (RNN/NN): Score hypothesis
    Recognizer->>LODR FST: Rescore hypothesis with FST and scale
    LODR FST-->>Recognizer: Return LODR score
    LM (RNN/NN)-->>Recognizer: Return LM score
    Recognizer-->>PythonScript: Final rescored hypothesis
    PythonScript-->>User: Output results

Poem

🐇
I hopped through FSTs and lattices wide,
With LODR and language models by my side.
Now rescoring is clever, robust, and neat—
Our recognition’s accuracy hard to beat!
From scripts to configs, new options bloom,
Lattice magic brings results that zoom!
Hooray for the code—let’s celebrate and eat!

📜 Recent review details

Configuration used: CodeRabbit UI Review profile: CHILL Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5dc574a6b8bb28f4aa750072056678dc37566181 and c761a7da9c420077fb3e5865cf1f9c8559f22361.

📒 Files selected for processing (1)

sherpa-onnx/csrc/lodr-fst.cc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

sherpa-onnx/csrc/lodr-fst.cc

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)

GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7
GitHub Check: aarch64 shared GPU ON 1.11.0
GitHub Check: windows-2022 3.12
GitHub Check: windows-2022 3.11
GitHub Check: windows-2022 3.7

✨ Finishing Touches

[ ] 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Jul 08 '25 14:07 coderabbitai[bot]

sherpa-onnx sherpa-onnx copied to clipboard

Add LODR support to online and offline recognizers

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

sherpa-onnx
sherpa-onnx copied to clipboard

CodeRabbit Configuration File (`.coderabbit.yaml`)