Casper
Casper
Excellent work @SuibhneOFoighil
This is with Qwen 2 7B
Today with 4x nodes and Qwen 2.5 7B, I logged 6.7 minutes until step 1. @vermouth1992 Do you have any idea of which process in the init is taking so...
@winglian @djsaunde This would be a super handy datasets feature! +1 from me
If you have a normal FP16/BF16 model, this does not happen. I would suggest you check if the model can run inference with Huggingface libraries as a first step
I want to make it more flexible, but PyPi only allow for one wheel. So I cannot upload multiple versions, but one fix could be to implement a flag in...
@qinxuye would it help with a flag like this? https://github.com/casper-hansen/AutoAWQ/pull/582
@WoosukKwon I have used the same shapes as referenced in the original implementation, yet it does not load in vLLM for reasons I am unsure how to fix. If I...
@shiqingzhangCSU currently there is no progress. if you have suggestions or fixes, please open a PR to my fork. i am hoping to have this feature in vLLM soon, but...
> @robertgshaw2-neuralmagic any luck with this patch? I benchmarked and those kernels are really something. Great boost on my internal tests! @bratao I believe rob has a branch over in...