llm2vec icon indicating copy to clipboard operation
llm2vec copied to clipboard

Can I use LORA to fine tune text embedding?

Open zhang373 opened this issue 3 months ago • 3 comments

Thank you for your kind contribution! Can I use LORA provided by hugging face to fine tune text embedding for down stream task? Can you give me some guidance?

zhang373 avatar May 02 '24 15:05 zhang373

Yes, definitely. Our scripts provide an example of how to use LORA finetuning for masked next token prediction (MNTP) and supervised contrastive learning. You can similarly use LORA finetuning for any other downstream task. Let me know if you have any more questions/specifics about the training.

vaibhavad avatar May 03 '24 18:05 vaibhavad

Thanks a lot for your kind contribution and answer! I have another question: can I add prompt to let the model focus on some specific area. For example, if I want to make the model focus more on the finance domain, can I add some prompt to describe the system role like "you are an expert in finance"?

zhang373 avatar May 05 '24 03:05 zhang373

It is generally recommended to keep the instructions in similar style as used in training. You can check Table 10 in our paper to see the instructions we have used for different datasets on MTEB. One of the datasets is FiQA2018, in which our instruction is Given a financial question, retrieve user replies that best answer the question.

vaibhavad avatar May 07 '24 17:05 vaibhavad

Feel free to re-open if you have any more questions regarding this issue.

vaibhavad avatar May 09 '24 14:05 vaibhavad