sugunav14 issues

Repositories
Issues
Comments

Results 2 issues of


                                            sugunav14

Value error when running decode attention with FP8 KV cache

I have k_cache and v_cache as torch.float8_e4m3fn tensors and am calling run (decode attention). I am getting this error. `ValueError: FlashInfer Internal Error: Invalid configuration : NUM_FRAGS_Q=1 NUM_FRAGS_D=8 NUM_FRAGS_KV=1 NUM_WARPS_Q=1...

feat: [AutoDeploy] DeepseekV3 e2e support with sdpa attention

Support deepseekv3 e2e example without attention forward patch - [x] Modify "TritonWithFlattenedInputs" backend to support sdpa-style attention with different head dims for v_head_dim and qk_head_dim - [x] Add unit tests...

AutoDeploy