sherpa-onnx
sherpa-onnx copied to clipboard
Offline Recognizer - Passing the Language for Multi-Language Models
If I want to extend the existing functionality for Whisper recognizer and pass a language at runtime, what would be a recommended approach? I looked at the decode() API, there are a lot of changes required to add a new parameter there. I was thinking to make a change to the recognizer's config at runtime. Any tips would be appreciated!
Please add two new methods after https://github.com/k2-fsa/sherpa-onnx/blob/117cd7bb8c262580718d87e17f01510c7ead3f92/sherpa-onnx/csrc/offline-recognizer.h#L120
OfflineRecognizerConfig GetConfig() const;
// Onnxruntime Session objects are not affected by this method.
// The exact behavior can be defined by a specific recognizer impl.
// For instance, for the whisper recognizer, you can retrieve the language and task from
// the config and ignore any remaining fields in `config`.
void SetConfig(const OfflineRecognizerConfig& config);
Note that you only need to care about the C++ API. If you want to add APIs for other programming languages, that is also fine.
Also note that you can provide default implementations for the above two newly added methods and you don't need to implement them for all recognizers. You can consider only the whisper recognizer at present.
If we want to specify an initial prompt for whisper, we can add some new fields to the recognizer config object and the interface can be kept the same after you add the two methods.
Thank you, @csukuangfj !
@csukuangfj I made the changes in this PR: https://github.com/k2-fsa/sherpa-onnx/pull/1124
Whisper only overrides the whisper's model portion.
For iOS, SherpaOnnx can now have this:
class SherpaOnnxOfflineRecognizer {
...
/// - config: config overwrite
func setConfig(config: UnsafePointer<SherpaOnnxOfflineRecognizerConfig>!) {
SherpaOnnxOfflineRecognizerSetConfig(recognizer, config)
}
Then in the
SherpaOnnxViewModel
...
var config = getNonStreamingWhisperTinyLangSpecificConfig(language: language)
self.offlineRecognizer.setConfig(config: &config)
startRecorder()
In the model:
Model
...
func getNonStreamingWhisperTinyLangSpecificConfig(language: String) -> SherpaOnnxOfflineRecognizerConfig {
let modelConfig = getNonStreamingWhisperTiny(language:language)
let featConfig = sherpaOnnxFeatureConfig(
sampleRate: 16000,
featureDim: 80)
return sherpaOnnxOfflineRecognizerConfig(
featConfig: featConfig,
modelConfig: modelConfig,
decodingMethod: "greedy_search",
maxActivePaths: 4
)
}
func getNonStreamingWhisperTiny(language: String) -> SherpaOnnxOfflineModelConfig {
let encoder = getResource("tiny-encoder.int8", "onnx")
let decoder = getResource("tiny-decoder.int8", "onnx")
let tokens = getResource("tiny-tokens", "txt")
return sherpaOnnxOfflineModelConfig(
tokens: tokens,
whisper: sherpaOnnxOfflineWhisperModelConfig(
encoder: encoder,
decoder: decoder,
language: language
),
numThreads: 1,
debug: 1,
modelType: "whisper"
)
}
Please let me know if I got this right, I have tested it locally with iOS and it is working as expected.
I also exposed the recognized language back to the calling client in the result (for the use case of automatic recognition).
I am still coming up to speed on this code base. I struggled to see how useful the GetConfig() will be for my particular use case, but I added it per your request.
Thank you!
I struggled to see how useful the GetConfig() will be for my particular use case, but I added it per your request.
You can remove it if you feel it is not needed at present.
The changes to the Swift code also look good to me. Thanks!
Marking resolved.