sugunav14 comments

Repositories
Issues
Comments

Results 4 comments of


                                            sugunav14

feat: [AutoDeploy] DeepseekV3 e2e support with sdpa attention

> @sugunav14, what's the issue with fp8 weight loading? DeepseekV3 weights are in fp8 on huggingface. Since we have the load_state_dict() patch in place now it loads the weights in...

feat: [AutoDeploy] DeepseekV3 e2e support with sdpa attention

Merged in this [MR](https://github.com/nv-auto-deploy/TensorRT-LLM/pull/10)

Value error when running decode attention with FP8 KV cache

num_heads_q is 32 and num_heads_kv is 8

Value error when running decode attention with FP8 KV cache

Also, another observation I made is that I am facing this error only when I set fuse_rope = True