Chanjun issues

Repositories
Issues
Comments

Results 15 issues of


                                            Chanjun

about grid search

https://github.com/casper-hansen/AutoAWQ/blob/5f3785dcaa107ca76f5fa5355f459370c86f82d6/awq/quantize/quantizer.py#L332 This is going to superimpose the previous scale every time，Should you use the initial weight every time？

slight nondeterminism

Marlin internally uses locks to synchronize the threads. This canresult in very slight nondeterminism for Marlin. why？

[Feature] Faster torch.compile

### Checklist - [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. - [x]...

help wanted

high priority

[Feature] lora serving performance

lora reasoning speed is very slow, I ran a gemma's lora, found that qkv proj takes 0.0003s, but without lora only 0.0001s, so the result is a token decode time...

performance

lora

Chanjun

about grid search

slight nondeterminism

[Feature] Faster torch.compile

[Feature] lora serving performance

after torch compile with 0.2.0, speed is become very slow