Sparsh Tewatia comments

Results 13 comments of


                                            Sparsh Tewatia

ExLlamaV2: exl2 support

I support it too

Unable to specify GPU usage in VLLM code

Any one found any solution , I am trying to use it with accelerate but getting same error

AWQ support

Convert it to AWQ if want to use VLLM , other wise Unsloth inference for 4bit models

Support for Volta / Turing architectures

This would be great.

Add RoPE scaling arguments to engine

Hi getting error for bigger context models like Microsoft Phi 3 medium with respect to rope scaling factors with exl2 format.

Add RoPE scaling arguments to engine

It is something related to this I think, maybe not much needs to be done here, just implement this code , I will try to test if it doesn't breaks...

Add RoPE scaling arguments to engine

https://github.com/vllm-project/vllm/pull/4298 vllm has implemented rotatry scale embeddings like this

[RFC] Initial Support for Cloud TPUs

Any initial benchmarks for models like Gemma2 9b and 27b on TPU V5e or V4, considering switching , Hex LLM the container from google achieves like 4000 tok/s on tpu...

v5litepod-8: metadata connection refused

I think it has to do with ray workers don't have access to metadata in gcp Temporary workaround is this , change it from line 112 in pallas.py ` #if...

How to do sequence classification training ?

I am also thinking of implementing Online DPO trainer with EasyDel, with a little bit of your support if you are interested. It can be comparable with PPO as per...