sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

How to use Whisper for multiple language STT or ASR in Android and IOS

Open KhoaNgo18 opened this issue 1 year ago • 11 comments

I wanted to use Whisper model for the STT but as I look into the code written for android and ios, I can't find the needed function to init the Whisper Model. I already see that the Whisper is supported for OfflineModel, btw I don't understand the concept of 2Pass, it would be great to get to know it better.

KhoaNgo18 avatar Mar 28 '24 05:03 KhoaNgo18

please see https://github.com/k2-fsa/sherpa-onnx/blob/de655e838e7e1cc073275be119e7cdf0bd5d4108/android/SherpaOnnx2Pass/app/src/main/java/com/k2fsa/sherpa/onnx/SherpaOnnx.kt#L351-L373


https://github.com/k2-fsa/sherpa-onnx/blob/de655e838e7e1cc073275be119e7cdf0bd5d4108/android/SherpaOnnx2Pass/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt#L209-L212

You can either see secondType to 2 or 3.

Remember to place the corresponding files to assets.


You can find pre-built ASR APKs with Whisper at https://github.com/k2-fsa/sherpa-onnx/releases/tag/v1.9.14

Screenshot 2024-03-28 at 14 21 07

csukuangfj avatar Mar 28 '24 06:03 csukuangfj

Similarly, for iOS, please see

https://github.com/k2-fsa/sherpa-onnx/blob/de655e838e7e1cc073275be119e7cdf0bd5d4108/ios-swiftui/SherpaOnnx2Pass/SherpaOnnx2Pass/Model.swift#L96

https://github.com/k2-fsa/sherpa-onnx/blob/de655e838e7e1cc073275be119e7cdf0bd5d4108/ios-swiftui/SherpaOnnx2Pass/SherpaOnnx2Pass/SherpaOnnxViewModel.swift#L94

You need to place the corresponding model files in your project.

csukuangfj avatar Mar 28 '24 06:03 csukuangfj

Can I only use Whisper since when I test with 2Pass, it can only detect English. Ask I understand in the 2Pass code, I have to have 2 models, sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 and sherpa-onnx-whisper-base.en. I cannot use only Whisper.

KhoaNgo18 avatar Mar 28 '24 07:03 KhoaNgo18

whisper is a non-streaming ASR model, you cannot use it for real-time streaming ASR.

We don't provide an APK or an example to use Whisepr alone for non-streaming ASR in Android/iOS, but we do provide APIs.

So the answer is yes; you can use Whisper alone in Android/iOS.

csukuangfj avatar Mar 28 '24 07:03 csukuangfj

can you guide me on how to use the APIs or at least where's the APIs at. I'm new to mobile and AI, so I appreciate your help a lot

KhoaNgo18 avatar Mar 28 '24 07:03 KhoaNgo18

You can find all the required APIs in our two-pass example, which I have already posted in the first comment.

If you are new to Android and iOS and are also new to Kotlin and Swift, then it may be difficult for you.

csukuangfj avatar Mar 28 '24 07:03 csukuangfj

I was able to get the whisper small multilingual to work with this 2Pass code for Romanian language:

func getNonStreamingWhisperSmall() -> SherpaOnnxOfflineModelConfig {
  let encoder = getResource("small-encoder.int8", "onnx")
  let decoder = getResource("small-decoder.int8", "onnx")
  let tokens = getResource("small-tokens", "txt")

  return sherpaOnnxOfflineModelConfig(
    tokens: tokens,
    whisper: sherpaOnnxOfflineWhisperModelConfig(
      encoder: encoder,
      decoder: decoder,
      language: "ro"
    ),
    numThreads: 1,
    modelType: "whisper"
  )
}

Then in the SherpaOnnxViewModel:initOfflineRecognizer()

change

let modelConfig = getNonStreamingWhisperTinyEn()

to

let modelConfig = getNonStreamingWhisperSmall()

I think it would make more sense for my use case to use VAD instead (as in SherpaOnnxVadAsr). I will try that next.

@csukuangfj how hard is it to make the code changes for the whisper model to take the language param at runtime, not while the model is loaded? Could you please point me to the general code area?

Thank you!

iprovalo avatar Jul 03 '24 17:07 iprovalo

@csukuangfj I noticed that if I just set the language to en, whisper will switch from transcribe to translate mode.

iprovalo avatar Jul 03 '24 18:07 iprovalo

@csukuangfj I think this is what I am looking for if I want to pass the language to decoder at runtime:

https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc#L30

iprovalo avatar Jul 04 '24 15:07 iprovalo

how hard is it to make the code changes for the whisper model to take the language param at runtime

I'm sorry; unfortunately, we don't provide an API for users to do that.

csukuangfj avatar Jul 05 '24 14:07 csukuangfj

@csukuangfj my bad, I misread the code. Whisper multilingual model config with an empty language is working perfectly. I tested it with VAD in iOS SherpaOnnx2Pass. After init VAD, just doing something similar to Android's version:

          let array = convertedBuffer.array()
          if !array.isEmpty {
              self.vad.acceptWaveform(samples: [Float](array))
              while !self.vad.isEmpty() {
                  let s = self.vad.front()
                  self.vad.pop()
                  let lastSentence = self.offlineRecognizer.decode(samples: s.samples).text
                  self.sentences.append(lastSentence)
                  self.updateLabel()
              }
          }

iprovalo avatar Jul 05 '24 14:07 iprovalo