delphiRo comments

Results 18 comments of


                                            delphiRo

170hx（any cmp hx card）can run higher and higher fp32 flops than before

Could you please share modified https://github.com/NVIDIA/open-gpu-kernel-modules/blob/81fe4fb417c8ac3b9bdcc1d56827d116743892a5/src/common/shared/inc/g_vgpu_chip_flags.h Did you measure llama bench before nvidia-driver patch and after?

170hx（any cmp hx card）can run higher and higher fp32 flops than before

> > Could you please share modified https://github.com/NVIDIA/open-gpu-kernel-modules/blob/81fe4fb417c8ac3b9bdcc1d56827d116743892a5/src/common/shared/inc/g_vgpu_chip_flags.h > > Did you measure llama bench before nvidia-driver patch and after? > > In fact, I can't use this card on...

170hx（any cmp hx card）can run higher and higher fp32 flops than before

Am I understand right that this patch didn't affect the llama performance on Windows too? Did you build the llama source with fma disable for windows with official driver and...

170hx（any cmp hx card）can run higher and higher fp32 flops than before

> > Am I understand right that this patch didn't affect the llama performance on Windows too? Did you build the llama source with fma disable for windows with official...

170hx（any cmp hx card）can run higher and higher fp32 flops than before

> > > [@delphiRo](https://github.com/delphiRo) I can tell you my email address: [[email protected]](mailto:[email protected]). But I'm just an amateur and may not be of much help. > > > > > >...

170hx（any cmp hx card）can run higher and higher fp32 flops than before

> Nouveau It seems that Nouveau doesn't even load itself as a module for non VGA dev. Isn't it?

[Bug]: Enabling fp8 KV cache quantization and prefix caching at the same time on Radeon (W7900/RDNA3) crashes the process

How to disable prefix prefill?

[Bug]: Enabling fp8 KV cache quantization and prefix caching at the same time on Radeon (W7900/RDNA3) crashes the process

It seems that the export VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 is not working solution in my case. I check on AMD Instinct Mi50 Rocm 6.3.4 and it log that all fp8 formats are not...

[Bug] qwen3-14b_q4f16 on GPU cause 100% of CPU when input request become more then 1500 tokens. Inference cause to become forever. (tested on CUDA,ROCM)

I Also check on AMD Instinct MI50 of Rocm 6.2.4. The same problem is also here when enabled the 32-40K context too

[Bug] qwen3-14b_q4f16 on GPU cause 100% of CPU when input request become more then 1500 tokens. Inference cause to become forever. (tested on CUDA,ROCM)

@simonw @jeethu @Sing-Li @philippgille