transformers.js icon indicating copy to clipboard operation
transformers.js copied to clipboard

WhisperTextStreamer token_ids must be a non-empty array of integers

Open SpeedyGonzaless opened this issue 8 months ago • 2 comments

System Info

@huggingface/transformers 3.4.2

Environment/Platform

  • [ ] Website/web-app
  • [x] Browser extension
  • [ ] Server-side (e.g., Node.js, Deno, Bun)
  • [ ] Desktop app (e.g., Electron)
  • [ ] Other (e.g., VSCode extension)

Description

I am using AutomaticSpeechRecognitionPipeline (automatic-speech-recognition) and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"

This problem was not happening on versions before 3.4.0

Reproduction

Define pipeline:

const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
                dtype: {
                    encoder_model:
                        this.model === "onnx-community/whisper-large-v3-turbo"
                            ? "fp16"
                            : "fp32",
                    decoder_model_merged: 'q4',
                },
                device: 'webgpu',
                progress_callback,
            });

And then try to define WhesperTextStreamer:

const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
        time_precision,
        on_chunk_start: (x) => {
            const offset = (chunk_length_s - stride_length_s) * chunk_count;
            chunks.push({
                text: "",
                timestamp: [offset + x, null],
                finalised: false,
                offset,
            });
        },
        token_callback_function: () => {
            start_time = start_time || performance.now();
            if (num_tokens++ > 0) {
                tps = (num_tokens / (performance.now() - start_time)) * 1000;
            }
        },
        callback_function: (x) => {
            if (chunks.length === 0) return;
            chunks.at(-1).text += x;
            console.log('chunk', chunks.at(-1).text);
            chrome.runtime.sendMessage({
                status: 'update',
                data: {chunks, tps},
            });
        },
        on_chunk_end: (x) => {
            const current = chunks.at(-1);
            current.timestamp[1] = x + current.offset;
            current.finalised = true;
        },
        on_finalize: () => {
            start_time = null;
            num_tokens = 0;
            chunk_count++;
        },
    });

SpeedyGonzaless avatar Apr 05 '25 23:04 SpeedyGonzaless

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?


One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

- const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
+ const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
// ...
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
// ...

xenova avatar Apr 22 '25 14:04 xenova

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error: "token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?

One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

  • const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
  • const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', { // ... const streamer = new WhisperTextStreamer(transcriber.tokenizer, { // ...

It reproduces during running the pipeline. You can reproduce it using your repository (https://github.com/xenova/whisper-web/tree/experimental-webgpu) Just update "@huggingface/transformers" to version "3.5.0" and use any file. I used this one:

https://github.com/user-attachments/assets/7a78b8b3-e206-402f-8029-5e612de6f40e

SpeedyGonzaless avatar Apr 23 '25 18:04 SpeedyGonzaless

Update: I just checked with 3.5.1 and the problem is still not solved.

@xenova, @fs-eire, @guschmue if you want to reproduce, you can try with my fork of Whisper-web and upgrade from 3.3.3 to any newer version. Any audio file and Whisper model will cause the problem.

Related issue: xenova/whisper-web#60

PierreMesure avatar May 08 '25 12:05 PierreMesure

Hi, I get the same error:

[email protected]:1 Uncaught Error: token_ids must be a non-empty array of integers.

with the WhisperTextStreamer()

Best

renderpci avatar May 26 '25 16:05 renderpci

Hi again,

I test the WhisperTextStreamer() with different versions. It works in version <=3.3.3 but it fail in any version >=3.4.0 and it not depends of the callbacks, it fail in any configuration.

The code has no other things, only print the result of the two functions.

const streamer = new WhisperTextStreamer(transcriber.tokenizer, {

	on_chunk_start: (chunkIndex) => {
		console.log("chunkIndex-->:",chunkIndex);
	},

	callback_function: (text) => {
		console.log("text:------->", text);
	}		
 });

and the model used is: Xenova/whisper-small

Any ideas?

Best

renderpci avatar May 29 '25 20:05 renderpci

Strange, I am unable to reproduce myself when loading it in this way: https://jsfiddle.net/q2cbvk73/1/ or https://jsfiddle.net/q2cbvk73/2/.

If someone can make a jsfiddle reproduction, that would help a ton.

xenova avatar May 29 '25 22:05 xenova

Ah okay, I see... it seems to happen when return_timestamps: true, which wasn't included in any of the above code snippets. Investigating further now :)

xenova avatar May 29 '25 22:05 xenova

https://github.com/huggingface/transformers.js/pull/1327 will fix it 👍

xenova avatar May 29 '25 22:05 xenova

Amazing! Thank you @xenova and everyone who reported!

I'll test on my Whisper Web fork this weekend! Looking forward to the next release with this fix!

PierreMesure avatar May 30 '25 04:05 PierreMesure

Looking forward to the next release with this fix!

https://www.npmjs.com/package/@huggingface/transformers/v/3.5.2 is out :)

xenova avatar May 31 '25 00:05 xenova

Thank you a lot!

@xenova you are amazing! And thanks for this great project, we are super impressed about the transformers.js, in our project we have «moral limitations» to use «comercial» servers API because the Dédalo projects manage sensible information as interviews of victims of nazi camps or Franco dictatorship... and transformers.js open the way to implement AI process in a safe space...

Thank you again.

Best.

renderpci avatar May 31 '25 08:05 renderpci

I found this issue because on latest 3.7.6, on_chunk_start and on_chunk_end are never called for a WhisperTextStreamer passed into AutomaticSpeechRecognitionPipeline. Is this a regression?

hybridherbst avatar Nov 14 '25 11:11 hybridherbst

I found this issue because on latest 3.7.6, on_chunk_start and on_chunk_end are never called for a WhisperTextStreamer passed into AutomaticSpeechRecognitionPipeline. Is this a regression?

If you can open a new issue, we'll be able to track it :) If you can also provide some sample code, that will be helpful. My guess is that it's incorrect usage, but let's see!

xenova avatar Nov 15 '25 04:11 xenova