Baizhou Zhang comments

Results 79 comments of


                                            Baizhou Zhang

[sgl-kernel] Support PDL for activatons

> It's in above, before PR I mean visualization of kernel timeline

[sgl-kernel] Support PDL for activatons

> The kernel runtime varies a lot between calls, but with PDL there's no inter-kernel gap, because it's able to launch as soon as some blocks in the prev kernel...

[sgl-kernel] Support PDL for activatons

Can you please post some accuracy tests for the kernels you improved?

Support kv8 (FP8) with torch_native attention backend

@JackChuang Please update this doc https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/attention_backend.md?plain=1#L22

Support kv8 (FP8) with torch_native attention backend

@JackChuang Please fix conflict

Support kv8 (FP8) with torch_native attention backend

@JackChuang Do you have any example of accuracy benchmarking when enabling fp8 kv cache with torch native backend

[Bug] HuggingFace and SGLang inference don't match

@pratcooper Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it?

[Bug] HuggingFace and SGLang inference don't match

> [@pratcooper](https://github.com/pratcooper) Hi, we just updated the codes for Lora. The misalignment bug should be fixed. Would you please test it? Also, the lora paths of adaptors should be passed...

[Bug] HuggingFace and SGLang inference don't match

sglang 0.4.2.post4, sgl-kernel 0.0.3.post3, flashinfer 0.2.0.post2

[Bug] HuggingFace and SGLang inference don't match

Hi @pratcooper Is this issue fixed?