Srinivas Billa comments

Results 41 comments of


Srinivas Billa

Would it be possible to support LoRA fine-tuned models?

Hey, I tried to do this, but when the model is loaded using Ray it doesn't work. I get this error ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File...

Would it be possible to support LoRA fine-tuned models?

@matankley try using s-lora. Looks interesting and they even compare the performance against vLLM. I haven't tried it but looks pretty good for multi adapter setup

Support for IP-Adapter-FaceID

The XL version isn't that good however. I've been trying the SD version for v2 and it's much better. Could this be used for just the refiner step? https://colab.research.google.com/drive/1rD69EGCsBh22sHBsvBX-k2NOPTSjLQoi?usp=sharing

Feature Request : ORPO

Hi @kartikayk Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

PowerInfer : using a combination of cpu and gpu for faster Inference

They show sparsity even in dense models like falcon. But I guess mixtral MoE is a better candidate

Faster Inference & Training Roadmap

> For training / finetuning: @danielhanchen Obligatory request for Multi GPU XD

Support LoRA adapter

Hey, I see that this only works for q,v loras. However most of the Qlora fine tunes use all k,q,v,o,up and down proj layers for llama architecture. Is there a...

Another question with colmap2nerf.py

you need to run it inside a folder where there is a subfolder called images

Investigate PagedAttention KV-cache memory management for faster inference

I assume it would be useful if we want to host the models and have a interface like chat.openai.com?

Investigate PagedAttention KV-cache memory management for faster inference

If we do end up building this for server use and I think that would be a good idea. Then this paging system would be very useful.