transformers.js
transformers.js copied to clipboard
WhisperTextStreamer token_ids must be a non-empty array of integers
System Info
@huggingface/transformers 3.4.2
Environment/Platform
- [ ] Website/web-app
- [x] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)
Description
I am using AutomaticSpeechRecognitionPipeline (automatic-speech-recognition) and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"
This problem was not happening on versions before 3.4.0
Reproduction
Define pipeline:
const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
dtype: {
encoder_model:
this.model === "onnx-community/whisper-large-v3-turbo"
? "fp16"
: "fp32",
decoder_model_merged: 'q4',
},
device: 'webgpu',
progress_callback,
});
And then try to define WhesperTextStreamer:
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
time_precision,
on_chunk_start: (x) => {
const offset = (chunk_length_s - stride_length_s) * chunk_count;
chunks.push({
text: "",
timestamp: [offset + x, null],
finalised: false,
offset,
});
},
token_callback_function: () => {
start_time = start_time || performance.now();
if (num_tokens++ > 0) {
tps = (num_tokens / (performance.now() - start_time)) * 1000;
}
},
callback_function: (x) => {
if (chunks.length === 0) return;
chunks.at(-1).text += x;
console.log('chunk', chunks.at(-1).text);
chrome.runtime.sendMessage({
status: 'update',
data: {chunks, tps},
});
},
on_chunk_end: (x) => {
const current = chunks.at(-1);
current.timestamp[1] = x + current.offset;
current.finalised = true;
},
on_finalize: () => {
start_time = null;
num_tokens = 0;
chunk_count++;
},
});
Hi there 👋
and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"
Does this mean the error occurs at construction, or when running the pipeline the first time?
There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?
One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔
- const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
+ const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
// ...
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
// ...
Hi there 👋
and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"
Does this mean the error occurs at construction, or when running the pipeline the first time?
There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?
One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔
- const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
- const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', { // ... const streamer = new WhisperTextStreamer(transcriber.tokenizer, { // ...
It reproduces during running the pipeline.
You can reproduce it using your repository (https://github.com/xenova/whisper-web/tree/experimental-webgpu)
Just update "@huggingface/transformers" to version "3.5.0" and use any file. I used this one:
https://github.com/user-attachments/assets/7a78b8b3-e206-402f-8029-5e612de6f40e
Update: I just checked with 3.5.1 and the problem is still not solved.
@xenova, @fs-eire, @guschmue if you want to reproduce, you can try with my fork of Whisper-web and upgrade from 3.3.3 to any newer version. Any audio file and Whisper model will cause the problem.
Related issue: xenova/whisper-web#60
Hi, I get the same error:
[email protected]:1 Uncaught Error: token_ids must be a non-empty array of integers.
with the WhisperTextStreamer()
Best
Hi again,
I test the WhisperTextStreamer() with different versions. It works in version <=3.3.3 but it fail in any version >=3.4.0 and it not depends of the callbacks, it fail in any configuration.
The code has no other things, only print the result of the two functions.
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
on_chunk_start: (chunkIndex) => {
console.log("chunkIndex-->:",chunkIndex);
},
callback_function: (text) => {
console.log("text:------->", text);
}
});
and the model used is: Xenova/whisper-small
Any ideas?
Best
Strange, I am unable to reproduce myself when loading it in this way: https://jsfiddle.net/q2cbvk73/1/ or https://jsfiddle.net/q2cbvk73/2/.
If someone can make a jsfiddle reproduction, that would help a ton.
Ah okay, I see... it seems to happen when return_timestamps: true, which wasn't included in any of the above code snippets.
Investigating further now :)
https://github.com/huggingface/transformers.js/pull/1327 will fix it 👍
Amazing! Thank you @xenova and everyone who reported!
I'll test on my Whisper Web fork this weekend! Looking forward to the next release with this fix!
Looking forward to the next release with this fix!
https://www.npmjs.com/package/@huggingface/transformers/v/3.5.2 is out :)
Thank you a lot!
@xenova you are amazing! And thanks for this great project, we are super impressed about the transformers.js, in our project we have «moral limitations» to use «comercial» servers API because the Dédalo projects manage sensible information as interviews of victims of nazi camps or Franco dictatorship... and transformers.js open the way to implement AI process in a safe space...
Thank you again.
Best.
I found this issue because on latest 3.7.6, on_chunk_start and on_chunk_end are never called for a WhisperTextStreamer passed into AutomaticSpeechRecognitionPipeline. Is this a regression?
I found this issue because on latest 3.7.6,
on_chunk_startandon_chunk_endare never called for aWhisperTextStreamerpassed intoAutomaticSpeechRecognitionPipeline. Is this a regression?
If you can open a new issue, we'll be able to track it :) If you can also provide some sample code, that will be helpful. My guess is that it's incorrect usage, but let's see!