whisper-web icon indicating copy to clipboard operation
whisper-web copied to clipboard

[experimental-webgpu] - Configuring Encoder/Decoder Precision with dtype for Local Models

Open kostia-ilani opened this issue 1 year ago • 2 comments

Hello,

I’m using whisper-web (experimental-webgpu branch) with local models, (env.allowLocalModels = true and env.localModelPath = "./models"), and facing challenges in setting distinct dtype values for encoder_model and decoder_model_merged with a - small model.

The error I see -

Uncaught (in promise) Error: Can't create a session. ERROR_CODE: 7, ERROR_MESSAGE: Failed to load model because protobuf parsing failed.

Is there a specific convention for key names or values when setting dtype for encoder/decoder precision levels (according to the models ONNX files?

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "my-whisper-model",
  {
    dtype: {
      encoder_model: "fp32",
      decoder_model_merged: "q4"
    },
    device: "webgpu"
  }
);

kostia-ilani avatar Nov 17 '24 12:11 kostia-ilani