unsloth
unsloth copied to clipboard
Sequence Classification
Hey, because of my own need, I added a feature to support LlamaForSequenceClassification. I wonder whether it would be a good feature for this project.
- I added the initialization of a new sequence classification model from language models, such as llama 3 8b.
- I modified llama to make it support the sequence classification. However I use torch.nn crossentropy not the fast_cross_entropy_loss, maybe I should use fast_cross_entropy_loss?
I'm open to feedback and suggestions on this proposal. Please let me know your thoughts on whether this feature aligns with the project's goals and if there are any improvements or changes I should consider. If there are any specific guidelines for the contribution, I would be very appreciated.
Thank you
@user074 Oh interesting! We welcome new contributions but for now Unsloth supports general LLM heads - more custom heads will require manual coding (eg as in ur case)
@user074 Hey can you make a branch with your classification-enabled code? I'm doing lots of classification but HF trainer eats up so much VRAM :( I'd love to try it out!
@sigjhl their fork is public here: https://github.com/user074/unsloth
@sigjhl their fork is public here: https://github.com/user074/unsloth
yeah basically you can pass an argument of
sequence_classification = True, num_labels = YOUR_NUMBER_OF_LABELS
when you initialize the unsloth. an example is like this:
model, tokenizer = FastLanguageModel.from_pretrained( model_name = YOUR_MODEL max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, sequence_classification = True, num_labels = 3, )
It can run but I think the performance is not as i expected. There might be bugs so i need to figure it out. I find another way to use language modeling to resolve my task, but i will try to update the code and resolve the issue for the classification
I find another way to use language modeling to resolve my task
Yeah w/ LLMs it might be better to do SFT w/ a prompt like:
prompt = f"""
Instructions: ... Categorize the text as one of these classes:
class1
class2
...
classk
Text: {text}
Class: """
That way, we can stay in unsloth
land
And then for inference, constrain token generation to be one of the classes using something like outlines. Or, if you need probabilities, use my package CAPPr :-)
The experiment here demonstrates that Llama 2 7B and Mistral 7B are both clearly beat by Roberta (0.355B) when using sequence classification / adding a linear layer. It's just one dataset, but that's surprising.
Oh fantastic work @user074 - I shall check this out !
@kddubey @user074 Thank you for the tips! On my dataset, qlora on e5-mistral (with classification head) did better than a fully finetuned BERT variant (ALBERT-xxl) and surprisingly, better than a qlora on llama-3-70b (trained with unsloth, prompt structured like a classification problem like your example, but with a chain-of-thought before the answer), so I'm exploring LLM-based embedding models for classification atm.
@kddubey I took a look at CAPPr and like it very much! Getting the probs is a real plus, and the focus on text classification makes it much more approachable than guidance lmql etc. I look forward to trying it out. The documentation "A note on workflow" is really great too; it is a perfect how-to for LLM research in people from other fields like myself.
@danielhanchen Thank you for unsloth! I'd never have dreamed to be able to finetune on my machine... very grateful! How can I cite this repo if I reference it in a future publication?
any update with this? I'm curious