about supervised training dataset
Hi, I download data for supervised training from here. But when i run your training code, allnli_split1.jsonl, allnli_split2.jsonl, quora_duplicates_split1.jsonl, quora_duplicates_split2.jsonl are missing. Where can I find them?
My apologies, I forgot to include a post-processing step, I'll change the code and fix it right away
The issue is fixed now. You should be able to run the code with the data that you downloaded. Please take the latest changes if you are building llm2vec from source, otherwise make sure llm2vec version from pip is >=0.1.6.
Apologies for the error again and let me know if you face any more issues
Feel free to re-open if you have any more questions about this issue.