LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

[llm_retriever] Questions about the dataset

Open OStars opened this issue 11 months ago • 0 comments

Hi, thanks for your great job. I run the download_data.sh script and obtain the dataset sucessfully, but I have some questions about what exactly each file means:

  1. What is the difference between passages.jsonl.gz and train.jsonl.gz?
  2. Which bm25 algorithm was used to obtain the bm25_train.jsonl? Can you provide the code or code link of the specific implementation?

OStars avatar Mar 15 '24 03:03 OStars