Chanjun

Results 15 issues of Chanjun

https://github.com/casper-hansen/AutoAWQ/blob/5f3785dcaa107ca76f5fa5355f459370c86f82d6/awq/quantize/quantizer.py#L332 This is going to superimpose the previous scale every time,Should you use the initial weight every time?

Marlin internally uses locks to synchronize the threads. This canresult in very slight nondeterminism for Marlin. why?

### Checklist - [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. - [x]...

help wanted
high priority

lora reasoning speed is very slow, I ran a gemma's lora, found that qkv proj takes 0.0003s, but without lora only 0.0001s, so the result is a token decode time...

performance
lora