transformers.js icon indicating copy to clipboard operation
transformers.js copied to clipboard

[WebGPU] Error running Xenova/musicgen-small

Open vonner04 opened this issue 8 months ago • 1 comments

System Info

@huggingface/transformers": "^3.5.1" "react": "^18.2.0"

Environment/Platform

  • [x] Website/web-app
  • [ ] Browser extension
  • [ ] Server-side (e.g., Node.js, Deno, Bun)
  • [ ] Desktop app (e.g., Electron)
  • [ ] Other (e.g., VSCode extension)

Description

I am using the musicgen-web example from the example in transformer.js repo.

I updated the imports accordingly to the huggingface/transformers version in system info.

When I change the device from wasm to webgpu the music that gets generated is just a loud electrical hum.

https://github.com/user-attachments/assets/658f1caf-1105-4edf-9bd0-4baf1a62d9e6

Reproduction

Steps to reproduce the behaviour:

  1. clone the repository
  2. use the musicgen-web example
  3. update the dependency via npm install @huggingface/transformers
  4. change device wasm to device: webgpu
  5. npm install and run the example
  6. generate music with any configuration

vonner04 avatar May 03 '25 23:05 vonner04

Hi there 👋 Indeed, I've experienced this issue myself when attempting to upgrade, as there appears to be some precision issues with decoder_model_merged.onnx (only) on WebGPU.

@fs-eire I've been able to reproduce both on JSEP and Native EP.

Example code
import { AutoTokenizer, MusicgenForConditionalGeneration, RawAudio } from '@huggingface/transformers';

// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained('Xenova/musicgen-small', {
  dtype: {
    text_encoder: 'q4',
    decoder_model_merged: 'q4',
    encodec_decode: 'fp32',
  },
//   device: "webgpu", // <-- uncomment this to enable webgpu (broken)
});

// Prepare text input
const prompt = 'a light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions bpm: 130';
const inputs = tokenizer(prompt);

// Generate audio
const audio_values = await model.generate({
  ...inputs,
  max_new_tokens: 500,
  do_sample: true,
  guidance_scale: 3,
});

// (Optional) Write the output to a WAV file
const audio = new RawAudio(audio_values.data, model.config.audio_encoder.sampling_rate);
audio.save('musicgen.wav');

xenova avatar May 06 '25 01:05 xenova