sherpa-onnx
sherpa-onnx copied to clipboard
is it possible to have realtime keyword spotting in flutter
Hi is it possible to use serpha-onnx for keyword spotting in a stream from mic in a flutter app?
I tried to modify the dart example from file but could not make it work for streaming. I get the audio from mic via https://pub.dev/packages/flutter_sound but it never detects any keword.
Could you show your changes?
Model setup should be okay (ModelLoader just loads sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01 and unpacks it so the app can use it)
Future<void> initialize({
required int sampleRate,
required String language,
}) async {
sherpa_onnx.initBindings();
final transducer = sherpa_onnx.OnlineTransducerModelConfig(
encoder: await _modelLoader.encoderPath(_modelName(language)),
decoder: await _modelLoader.decoderPath(_modelName(language)),
joiner: await _modelLoader.joinerPath(_modelName(language)),
);
final modelConfig = sherpa_onnx.OnlineModelConfig(
transducer: transducer,
tokens: await _modelLoader.tokensPath(_modelName(language)),
);
final config = sherpa_onnx.KeywordSpotterConfig(
model: modelConfig,
keywordsFile: await _modelLoader.keywordsPath(_modelName(language)),
);
_spotter = sherpa_onnx.KeywordSpotter(config);
_stream = _spotter.createStream();
_sampleRate = sampleRate;
}
When now calling predict with samples emitted from flutter_sound stream it predicts null all the time.
String? predict(Uint8List samples) {
final samplesFloat32 = _convertBytesToFloat32(samples);
_stream.acceptWaveform(
samples: samplesFloat32,
sampleRate: _sampleRate,
);
while (_spotter.isReady(_stream)) {
_spotter.decode(_stream);
}
final keyword = _spotter.getResult(_stream).keyword;
if (keyword.isNotEmpty) {
print('Detected: $keyword');
return keyword;
}
return null;
}
}
Float32List _convertBytesToFloat32(
Uint8List bytes, [
Endian endian = Endian.little,
]) {
final values = Float32List(bytes.length ~/ 2);
final data = ByteData.view(bytes.buffer);
for (var i = 0; i < bytes.length; i += 2) {
final short = data.getInt16(i, endian);
values[i ~/ 2] = short / 32678.0;
}
return values;
}
flutter_sound config looks like this 16bit PCM, 16000 sampleRate
await _recorder.startRecorder(
toStream: _audioController.sink,
codec: base.Codec.pcm16,
);
please check _spotter.ptr and _stream.ptr and see if they are null.
I suspect that model initialization is failed.
Make sure you read the logs carefully.
Note you can pass debug: true to ModelConfig to get more logs.
By the way, please change
while (_spotter.isReady(_stream)) {
_spotter.decode(_stream);
}
final keyword = _spotter.getResult(_stream).keyword;
if (keyword.isNotEmpty) {
print('Detected: $keyword');
return keyword;
}
You need to put _spotter.getResult in the while loop.
In case you have not read our KWS dart example, please read it now: https://github.com/k2-fsa/sherpa-onnx/blob/a7dc6c2c165de16c68daaf78490d159f51c54d44/dart-api-examples/keyword-spotter/bin/zipformer-transducer.dart#L72-L78
It clearly shows how to do that.
Hi, sorry to re-open this old conversation, but I'm faced the exact same problem (flutter sound + serpha-onnx for keyword spotting) and documentation doesn't provide any answer about how to deal with real-time signals for KWS. i tried several things based on recommendations and code above, but no way to succeed. Is there something we should know to make it work properly ? Thank you for your consideration.
please see our asr example.