sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Make Marlin incompatible AWQ models work

Open bjmsong opened this issue 3 days ago • 0 comments

Motivation

Relate to https://github.com/sgl-project/sglang/issues/3571, some AWQ models are incompatible with marlin kernels.

Modifications

Use unoptimized kernel if the models are incompatible with marlin kernels.

test script

python examples/runtime/engine/offline_batch_inference.py --model=${DeepSeek-V2-Lite-Chat-AWQ} --trust-remote-code

refer to this PR

Checklist

  • [ ] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

bjmsong avatar Feb 21 '25 12:02 bjmsong