Baizhou Zhang

Results 79 comments of Baizhou Zhang

> It's in above, before PR I mean visualization of kernel timeline

> The kernel runtime varies a lot between calls, but with PDL there's no inter-kernel gap, because it's able to launch as soon as some blocks in the prev kernel...

Can you please post some accuracy tests for the kernels you improved?

@JackChuang Please update this doc https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/attention_backend.md?plain=1#L22

@JackChuang Do you have any example of accuracy benchmarking when enabling fp8 kv cache with torch native backend

@pratcooper Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it?

> [@pratcooper](https://github.com/pratcooper) Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it? Also, the lora paths of adaptors should be passed...

sglang 0.4.2.post4, sgl-kernel 0.0.3.post3, flashinfer 0.2.0.post2

Hi @pratcooper Is this issue fixed?