The inference results on the same model are inconsistent when using WebGPU and WASM
System Info
"@huggingface/transformers": "^3.0.0-alpha.5"
Environment/Platform
- [X] Website/web-app
- [X] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)
Description
When I perform NER inference using WebGPU, the results vary across different users' computers, and the execution results of WebGPU differ from those of WASM. The only code changes between using WASM and WebGPU involve modifying the device setting from WebGPU to WASM. For some models, there is no difference, such as with Xenova/bert-base-multilingual-cased-ner-hrl
Reproduction
First, I converted the model to ONNX format using the following method: On the v3 branch of transformers.js, I executed the command python -m script.convert --quantize --model_id Isotonic/distilbert_finetuned_ai4privacy_v2. Then, I used the following code for model loading and inference:
env.allowLocalModels = true;
env.backends.onnx.wasm.numThreads = 1;
export class PipelineSingleton {
static task = 'token-classification';
// static model = '/Isotonic/distilbert_finetuned_ai4privacy_v2 ';
static instance = null;
static async getInstance(progress_callback = null) {
if (this.instance === null) {
this.instance = pipeline(this.task, this.model, {
progress_callback,
dtype: "fp16",
device: "wasm"
});
}
return this.instance;
}
}
When performing inference on the text "Anuj Joshi - Founder (May 2020) Over 22+ experience in channel space building various Route To Markets for global giants like Amazon, IBM & Autodesk," no entities are extracted. However, if the device is changed to "wasm", entities can be extracted.
I gave a try and it seems fp32 model on WebGPU (as below) could get similar results as WASM.
Hello, I seem to also be experiencing discrepancies between the two:
const modelIdentifier = 'Xenova/roberta-large-mnli';
const device = "webgpu" OR "wasm" <<<
const classifier = await pipeline('zero-shot-classification', modelIdentifier, {
device: "webgpu",
dtype: 'fp16'
});
with following classes: let classes = ['text', 'image'];
I am getting inconsistent results between webgpu and wasm.
wasm
labels: (2) ['image', 'text'] scores: (2) [0.9860802693459424, 0.013919730654057532] sequence: "generate a photo of a duck"
webgpu
labels: (2) ['image', 'text'] scores: (2) [0.5622966798924082, 0.43770332010759183] sequence: "generate a photo of a duck"