Where is `apply_penalty_inplace_kernel` used?

Open reeselevine opened this issue 9 months ago • 1 comments

Hi,

I'm doing some work looking into the performance and safety of WebGPU under different workloads, and I was doing some characterization of WebLLM (btw, this is awesome work and a great resource, thanks for building it!).

I had a question about one particular kernel, apply_penalty_inplace_kernel. I see that it's being compiled when loading the model, but I don't see it being run during inference. Is there a setting I'm missing that causes this kernel to apply, or certain models that it runs for and not others?

Hopefully this is the right place to ask, and thanks in advance for any info!

Mar 04 '25 19:03 reeselevine

Thanks for your interest in WebLLM!

This kernel is indeed not used in WebLLM yet, as WebLLM currently applies penalties using CPU (i.e. not using a GPU kernel).

This is something we want to optimize.

May 05 '25 06:05 CharlieFRuan