PHUDGE Phudge Ollama version

Hi there,

I recently stumbled upon your paper, and Phudge looks great! I was wondering if you considered adding it to ollama so that it can be used in an efficient manner (ollama is nice because you can run the eval even super conveniently on CPU).

Here'd be an instruction for how to convert it: https://github.com/ollama/ollama/blob/main/docs/import.md#publishing-your-model-optional--early-alpha.

PS: I am not affiliated with ollama in any way, shape, or form, I just like it :). Here's and example on using a Llama 3 model via ollama for model eval. I think doing something similar with Phudge would be cool!

Jun 08 '24 13:06 rasbt

Hey @rasbt so happy to see that it reached to you somehow and thank you for the kind words. Would definitely add it to ollama. As I haven't worked personally in the past, will try my best to finish it ASAP.

Jun 10 '24 08:06 deshwalmahesh

One question to that: https://huggingface.co/vicgalle/Phudge-3/blob/b66fb03609f66097b098e2ede782170082de56cd/config.json#L4

This seems to be a modified PHUDGE model as it uses a Phi3ForSequenceClassification architecture, so having a have a linear layer head for sequence classification on top, instead of a Phi3ForCausalLM architecture.

Is there any way to download your vanilla PHUDGE model having a Phi3ForCausalLM architecture?

I have seen your notebook on Kaggle. I was able to set up a CausalLM, but there is no PEFT model for CausalLM. For LORA_PATH = "./input/lora_128_REG" it says:

The model 'PeftModelForSequenceClassification' is not supported for text-generation.

Can you please provide a LoRA model for CausalLM (PeftModelForCausalLM) here, like lora_128_CAUSAL?

Jun 12 '24 14:06 d-kleine

Yes @d-kleine the PHUDGE model, which gives the best results uses a Classification Layer. If you read the paper you'll know that the Causal head makes it not only slower but has around ~4% worse results. I trained it on Causal also but need to check for weights.

In the meantime, you can use the training script to train your Causal model end to end by creating data from this notebook and then training using this script

Jun 13 '24 15:06 deshwalmahesh

@deshwalmahesh Any update on this?

Aug 07 '24 09:08 d-kleine