tevatron icon indicating copy to clipboard operation
tevatron copied to clipboard

where can i obtain the text version of Wiki-SS?

Open zxy1728 opened this issue 1 year ago • 4 comments

where can i obtain the text version of Wiki-SS?

zxy1728 avatar Dec 21 '24 03:12 zxy1728

and what is the encoder of text version?

zxy1728 avatar Dec 21 '24 03:12 zxy1728

Hi @zxy1728,

Thank you for your interest in our work. both text version and image version of the corpus can be found at here. https://huggingface.co/datasets/Tevatron/wiki-ss-corpus

for training data, it can be found at https://huggingface.co/datasets/Tevatron/wiki-ss-nq

for text encoder, you can use either bert, e5, phi3-vision (but skip the image encoder) to train on the text version of the data.

MXueguang avatar Dec 22 '24 15:12 MXueguang

Hello, I would like to ask how many positive and negative samples are there for each query in Tevatron/wiki-ss-nq dataset, and how many positive and negative samples are there during training? Thanks!

zxy1728 avatar May 12 '25 06:05 zxy1728

Hello @MXueguang . May I ask if the queries in wiki-ss-nq dataset are human annotated, or synthetically generated? thanks.

rnyak avatar Jul 01 '25 17:07 rnyak