Brayden Zhong
Brayden Zhong
Q: @nvmbreughe it seems https://github.com/sgl-project/sglang/pull/11813 is maybe the same issue. (cutlass backend is fine). also in sgl we assert alpha so it is not none so I don't think it's...
Completed by https://github.com/sgl-project/sglang/pull/5099
/tag-and-rerun-ci
/tag-and-rerun-ci
/rerun-stage unit-test-backend-4-gpu-b200
/tag-and-rerun-ci
@janbernloehr Can you fix the lint? Thanks.
@ArjunBhalla98 What do you think of these metrics? ```diff python class PromptTokenUsageInfo(OpenAIBaseModel): cached_tokens: Optional[int] = None + cache_hit_ratio: Optional[float] = None ```
/tag-and-rerun-ci again
@Fridge003 No it still works, you can still enable it through `CUTLASS_BLOCK_FP8_SUPPORTED` (like before) manually