Results 41 comments of Srinivas Billa

Hey, I tried to do this, but when the model is loaded using Ray it doesn't work. I get this error ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File...

@matankley try using s-lora. Looks interesting and they even compare the performance against vLLM. I haven't tried it but looks pretty good for multi adapter setup

The XL version isn't that good however. I've been trying the SD version for v2 and it's much better. Could this be used for just the refiner step? https://colab.research.google.com/drive/1rD69EGCsBh22sHBsvBX-k2NOPTSjLQoi?usp=sharing

Hi @kartikayk Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

They show sparsity even in dense models like falcon. But I guess mixtral MoE is a better candidate

> For training / finetuning: @danielhanchen Obligatory request for Multi GPU XD

Hey, I see that this only works for q,v loras. However most of the Qlora fine tunes use all k,q,v,o,up and down proj layers for llama architecture. Is there a...

you need to run it inside a folder where there is a subfolder called images

I assume it would be useful if we want to host the models and have a interface like chat.openai.com?

If we do end up building this for server use and I think that would be a good idea. Then this paging system would be very useful.