zhengjia
zhengjia
> Hi @pcmoritz, you can now directly convert between the two data types in the latest ROCm release. btw, can you provide the public image link for latest rocm build...
another thing, if need `__float2bfloat16()`, `__hip_bfloat16` has built-in apis, while `hip_bfloat16` doesn't
serving bench results update: ## baseline without use_dp_linear ```yml # prefill ============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Max reqeuest concurrency: 32 Successful requests: 200 Benchmark...
> Hi, please let me know when it's ready for review. Thanks! hi, @zhyncs many thanks for review. bench test covered just few isl/osl/num_promts on h20/mi30x
any update on cutlass supporting recently ?
better to read https://github.com/FasterDecoding/Medusa/tree/main this code arch first, much clear