Baizhou Zhang
Baizhou Zhang
> It's in above, before PR I mean visualization of kernel timeline
> The kernel runtime varies a lot between calls, but with PDL there's no inter-kernel gap, because it's able to launch as soon as some blocks in the prev kernel...
Can you please post some accuracy tests for the kernels you improved?
@JackChuang Please update this doc https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/attention_backend.md?plain=1#L22
@JackChuang Please fix conflict
@JackChuang Do you have any example of accuracy benchmarking when enabling fp8 kv cache with torch native backend
@pratcooper Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it?
> [@pratcooper](https://github.com/pratcooper) Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it? Also, the lora paths of adaptors should be passed...
sglang 0.4.2.post4, sgl-kernel 0.0.3.post3, flashinfer 0.2.0.post2
Hi @pratcooper Is this issue fixed?