UniIR
UniIR copied to clipboard
About data upsampling
Firstly, many thanks to your contribution to the community, the datasets can be very helpful.
Can you explain the meaning of "upsampled" in https://huggingface.co/datasets/TIGER-Lab/M-BEIR? How did you upsample the smaller datasets?
mbeir_union_up_train.jsonl: This file is the default training data for in-batch contrastive training specifically designed for UniIR models. It aggregates all the data from the train directory and datasets with relatively smaller sizes have been upsampled to balance the training process.