where can i obtain the text version of Wiki-SS?
where can i obtain the text version of Wiki-SS?
and what is the encoder of text version?
Hi @zxy1728,
Thank you for your interest in our work. both text version and image version of the corpus can be found at here. https://huggingface.co/datasets/Tevatron/wiki-ss-corpus
for training data, it can be found at https://huggingface.co/datasets/Tevatron/wiki-ss-nq
for text encoder, you can use either bert, e5, phi3-vision (but skip the image encoder) to train on the text version of the data.
Hello, I would like to ask how many positive and negative samples are there for each query in Tevatron/wiki-ss-nq dataset, and how many positive and negative samples are there during training? Thanks!
Hello @MXueguang . May I ask if the queries in wiki-ss-nq dataset are human annotated, or synthetically generated? thanks.