Sparsh Tewatia

Results 13 comments of Sparsh Tewatia

I support it too

Any one found any solution , I am trying to use it with accelerate but getting same error

Convert it to AWQ if want to use VLLM , other wise Unsloth inference for 4bit models

Hi getting error for bigger context models like Microsoft Phi 3 medium with respect to rope scaling factors with exl2 format.

It is something related to this I think, maybe not much needs to be done here, just implement this code , I will try to test if it doesn't breaks...

https://github.com/vllm-project/vllm/pull/4298 vllm has implemented rotatry scale embeddings like this

Any initial benchmarks for models like Gemma2 9b and 27b on TPU V5e or V4, considering switching , Hex LLM the container from google achieves like 4000 tok/s on tpu...

I think it has to do with ray workers don't have access to metadata in gcp Temporary workaround is this , change it from line 112 in pallas.py ` #if...

I am also thinking of implementing Online DPO trainer with EasyDel, with a little bit of your support if you are interested. It can be comparable with PPO as per...