llm2vec icon indicating copy to clipboard operation
llm2vec copied to clipboard

about supervised training dataset

Open Hanser14Forever opened this issue 1 year ago • 2 comments

Hi, I download data for supervised training from here. But when i run your training code, allnli_split1.jsonl, allnli_split2.jsonl, quora_duplicates_split1.jsonl, quora_duplicates_split2.jsonl are missing. Where can I find them?

Hanser14Forever avatar May 08 '24 03:05 Hanser14Forever

My apologies, I forgot to include a post-processing step, I'll change the code and fix it right away

vaibhavad avatar May 08 '24 15:05 vaibhavad

The issue is fixed now. You should be able to run the code with the data that you downloaded. Please take the latest changes if you are building llm2vec from source, otherwise make sure llm2vec version from pip is >=0.1.6.

Apologies for the error again and let me know if you face any more issues

vaibhavad avatar May 08 '24 16:05 vaibhavad

Feel free to re-open if you have any more questions about this issue.

vaibhavad avatar May 09 '24 16:05 vaibhavad