tevatron where can i obtain the text version of Wiki-SS?

where can i obtain the text version of Wiki-SS?

Dec 21 '24 03:12 zxy1728

and what is the encoder of text version?

Dec 21 '24 03:12 zxy1728

Hi @zxy1728,

Thank you for your interest in our work. both text version and image version of the corpus can be found at here. https://huggingface.co/datasets/Tevatron/wiki-ss-corpus

for training data, it can be found at https://huggingface.co/datasets/Tevatron/wiki-ss-nq

for text encoder, you can use either bert, e5, phi3-vision (but skip the image encoder) to train on the text version of the data.

Dec 22 '24 15:12 MXueguang

Hello, I would like to ask how many positive and negative samples are there for each query in Tevatron/wiki-ss-nq dataset, and how many positive and negative samples are there during training? Thanks!

May 12 '25 06:05 zxy1728

Hello @MXueguang . May I ask if the queries in wiki-ss-nq dataset are human annotated, or synthetically generated? thanks.

Jul 01 '25 17:07 rnyak