Sparsh Tewatia
Sparsh Tewatia
I support it too
Any one found any solution , I am trying to use it with accelerate but getting same error
Convert it to AWQ if want to use VLLM , other wise Unsloth inference for 4bit models
This would be great.
Hi getting error for bigger context models like Microsoft Phi 3 medium with respect to rope scaling factors with exl2 format.
It is something related to this I think, maybe not much needs to be done here, just implement this code , I will try to test if it doesn't breaks...
https://github.com/vllm-project/vllm/pull/4298 vllm has implemented rotatry scale embeddings like this
Any initial benchmarks for models like Gemma2 9b and 27b on TPU V5e or V4, considering switching , Hex LLM the container from google achieves like 4000 tok/s on tpu...
I think it has to do with ray workers don't have access to metadata in gcp Temporary workaround is this , change it from line 112 in pallas.py ` #if...
I am also thinking of implementing Online DPO trainer with EasyDel, with a little bit of your support if you are interested. It can be comparable with PPO as per...