transformers.js The inference results on the same model are inconsistent when using WebGPU and WASM

System Info

"@huggingface/transformers": "^3.0.0-alpha.5"

Environment/Platform

[X] Website/web-app
[X] Browser extension
[ ] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[ ] Other (e.g., VSCode extension)

Description

When I perform NER inference using WebGPU, the results vary across different users' computers, and the execution results of WebGPU differ from those of WASM. The only code changes between using WASM and WebGPU involve modifying the device setting from WebGPU to WASM. For some models, there is no difference, such as with Xenova/bert-base-multilingual-cased-ner-hrl

Reproduction

First, I converted the model to ONNX format using the following method: On the v3 branch of transformers.js, I executed the command python -m script.convert --quantize --model_id Isotonic/distilbert_finetuned_ai4privacy_v2. Then, I used the following code for model loading and inference:

env.allowLocalModels = true;
env.backends.onnx.wasm.numThreads = 1;

export class PipelineSingleton {
    static task = 'token-classification';
    // static model = '/Isotonic/distilbert_finetuned_ai4privacy_v2 ';

    static instance = null;

    static async getInstance(progress_callback = null) {
        if (this.instance === null) {
            this.instance = pipeline(this.task, this.model, {
                progress_callback,
                dtype: "fp16",
                device: "wasm"
            });
        }

        return this.instance;
    }
}

When performing inference on the text "Anuj Joshi - Founder (May 2020) Over 22+ experience in channel space building various Route To Markets for global giants like Amazon, IBM & Autodesk," no entities are extracted. However, if the device is changed to "wasm", entities can be extracted.

Aug 23 '24 03:08 helloburke

I gave a try and it seems fp32 model on WebGPU (as below) could get similar results as WASM.

Sep 14 '24 06:09 gyagp

Hello, I seem to also be experiencing discrepancies between the two:

const modelIdentifier = 'Xenova/roberta-large-mnli';
const device = "webgpu" OR "wasm" <<<
const classifier = await pipeline('zero-shot-classification', modelIdentifier, {
    device: "webgpu",
    dtype: 'fp16'
});

with following classes: let classes = ['text', 'image'];

I am getting inconsistent results between webgpu and wasm.

wasm

labels: (2) ['image', 'text'] scores: (2) [0.9860802693459424, 0.013919730654057532] sequence: "generate a photo of a duck"

webgpu

labels: (2) ['image', 'text'] scores: (2) [0.5622966798924082, 0.43770332010759183] sequence: "generate a photo of a duck"

Feb 01 '25 19:02 johnrobertcobbold