LMOps
LMOps copied to clipboard
[llm_retriever] Questions about the dataset
Hi, thanks for your great job. I run the download_data.sh
script and obtain the dataset sucessfully, but I have some questions about what exactly each file means:
- What is the difference between
passages.jsonl.gz
andtrain.jsonl.gz
? - Which bm25 algorithm was used to obtain the
bm25_train.jsonl
? Can you provide the code or code link of the specific implementation?