Brayden Zhong

Results 46 comments of Brayden Zhong

Q: @nvmbreughe it seems https://github.com/sgl-project/sglang/pull/11813 is maybe the same issue. (cutlass backend is fine). also in sgl we assert alpha so it is not none so I don't think it's...

Completed by https://github.com/sgl-project/sglang/pull/5099

/rerun-stage unit-test-backend-4-gpu-b200

/tag-and-rerun-ci

@janbernloehr Can you fix the lint? Thanks.

@ArjunBhalla98 What do you think of these metrics? ```diff python class PromptTokenUsageInfo(OpenAIBaseModel): cached_tokens: Optional[int] = None + cache_hit_ratio: Optional[float] = None ```

@Fridge003 No it still works, you can still enable it through `CUTLASS_BLOCK_FP8_SUPPORTED` (like before) manually