Raayan Dhar
Raayan Dhar
I'm happy to try and tackle this as OS contributor, at least using the CUTLASS backend, as I already see for FP8 and FP4 it does not seem very difficult....
Going to give bf16 cutlass GEMM a shot, see how it will do at small batch sizes
> Just wondering, is there a progress with this issue? Hi, yes, sorry was a bit busy. I have a PR out https://github.com/flashinfer-ai/flashinfer/pull/2070 that is 90% done, I'm planning on...
@vadiklyutiy now the PR is awaiting review. The numbers at batch=64 are better than all the others, but elsewhere we are slightly worse. I'm a total CUTLASS newbie, so it...
Will be tackling this
@b8zhong I saw you self-assigned, am I still good to work on this?
I will continue working on this, there is more improvements to be made.
Hi experts, At this point, looking at the profiling, there's been some pretty good improvement in times. Looking at `time python -X importtime -c "from sglang.srt.managers.scheduler import Scheduler" 2> import_sglang.log`,...
If it's acceptable, I'm happy to continue applying the "free lunch" changes I mentioned in the bigger comment above. But I defer to the experts.
> The drawback is that it introduces some extra cognitive load. May you fix the conflicts first? Thanks! Sure, will do.