unsloth Sequence Classification

Hey, because of my own need, I added a feature to support LlamaForSequenceClassification. I wonder whether it would be a good feature for this project.

I added the initialization of a new sequence classification model from language models, such as llama 3 8b.
I modified llama to make it support the sequence classification. However I use torch.nn crossentropy not the fast_cross_entropy_loss, maybe I should use fast_cross_entropy_loss?

I'm open to feedback and suggestions on this proposal. Please let me know your thoughts on whether this feature aligns with the project's goals and if there are any improvements or changes I should consider. If there are any specific guidelines for the contribution, I would be very appreciated.

Thank you

Apr 23 '24 03:04 user074

@user074 Oh interesting! We welcome new contributions but for now Unsloth supports general LLM heads - more custom heads will require manual coding (eg as in ur case)

Apr 23 '24 14:04 danielhanchen

@user074 Hey can you make a branch with your classification-enabled code? I'm doing lots of classification but HF trainer eats up so much VRAM :( I'd love to try it out!

Jun 19 '24 12:06 sigjhl

@sigjhl their fork is public here: https://github.com/user074/unsloth

Jun 30 '24 17:06 kddubey

@sigjhl their fork is public here: https://github.com/user074/unsloth

yeah basically you can pass an argument of sequence_classification = True, num_labels = YOUR_NUMBER_OF_LABELS when you initialize the unsloth. an example is like this: model, tokenizer = FastLanguageModel.from_pretrained( model_name = YOUR_MODEL max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, sequence_classification = True, num_labels = 3, )

It can run but I think the performance is not as i expected. There might be bugs so i need to figure it out. I find another way to use language modeling to resolve my task, but i will try to update the code and resolve the issue for the classification

Jun 30 '24 17:06 user074

I find another way to use language modeling to resolve my task

Yeah w/ LLMs it might be better to do SFT w/ a prompt like:

prompt = f"""
Instructions: ... Categorize the text as one of these classes:
class1
class2
...
classk

Text: {text}
Class: """

That way, we can stay in unsloth land

And then for inference, constrain token generation to be one of the classes using something like outlines. Or, if you need probabilities, use my package CAPPr :-)

The experiment here demonstrates that Llama 2 7B and Mistral 7B are both clearly beat by Roberta (0.355B) when using sequence classification / adding a linear layer. It's just one dataset, but that's surprising.

Jun 30 '24 18:06 kddubey

Oh fantastic work @user074 - I shall check this out !

Jul 01 '24 00:07 danielhanchen

@kddubey @user074 Thank you for the tips! On my dataset, qlora on e5-mistral (with classification head) did better than a fully finetuned BERT variant (ALBERT-xxl) and surprisingly, better than a qlora on llama-3-70b (trained with unsloth, prompt structured like a classification problem like your example, but with a chain-of-thought before the answer), so I'm exploring LLM-based embedding models for classification atm.

@kddubey I took a look at CAPPr and like it very much! Getting the probs is a real plus, and the focus on text classification makes it much more approachable than guidance lmql etc. I look forward to trying it out. The documentation "A note on workflow" is really great too; it is a perfect how-to for LLM research in people from other fields like myself.

@danielhanchen Thank you for unsloth! I'd never have dreamed to be able to finetune on my machine... very grateful! How can I cite this repo if I reference it in a future publication?

Jul 01 '24 05:07 sigjhl

any update with this? I'm curious

Aug 05 '24 19:08 BenBatsir

unsloth unsloth copied to clipboard

Sequence Classification

unsloth
unsloth copied to clipboard