Brayden Zhong comments

Results 46 comments of


                                            Brayden Zhong

mm_fp4 regression (need 0.2s cpu time per run)

Q: @nvmbreughe it seems https://github.com/sgl-project/sglang/pull/11813 is maybe the same issue. (cutlass backend is fine). also in sgl we assert alpha so it is not none so I don't think it's...

[Feature] support mistral small vlm

Completed by https://github.com/sgl-project/sglang/pull/5099

tiny remove deprecated endpoint call

/tag-and-rerun-ci

tiny remove deprecated endpoint call

/tag-and-rerun-ci

direct register custom op for mm_fp4

/rerun-stage unit-test-backend-4-gpu-b200

fix trtllm mla spec

/tag-and-rerun-ci

Add Llama4 attention backend auto-selection

@janbernloehr Can you fix the lint? Thanks.

[Feature]: Add KV Cache Metrics to Usage Object

@ArjunBhalla98 What do you think of these metrics? ```diff python class PromptTokenUsageInfo(OpenAIBaseModel): cached_tokens: Optional[int] = None + cache_hit_ratio: Optional[float] = None ```

[sgl-kernel] Support PDL for activatons

/tag-and-rerun-ci again

enable flashinfer fp8 gemm if deepgemm disabled

@Fridge003 No it still works, you can still enable it through `CUTLASS_BLOCK_FP8_SUPPORTED` (like before) manually