sugunav14

Results 2 issues of sugunav14

I have k_cache and v_cache as torch.float8_e4m3fn tensors and am calling run (decode attention). I am getting this error. `ValueError: FlashInfer Internal Error: Invalid configuration : NUM_FRAGS_Q=1 NUM_FRAGS_D=8 NUM_FRAGS_KV=1 NUM_WARPS_Q=1...

Support deepseekv3 e2e example without attention forward patch - [x] Modify "TritonWithFlattenedInputs" backend to support sdpa-style attention with different head dims for v_head_dim and qk_head_dim - [x] Add unit tests...

AutoDeploy