UniIR icon indicating copy to clipboard operation
UniIR copied to clipboard

About data upsampling

Open Raion-Shin opened this issue 1 year ago • 0 comments

Firstly, many thanks to your contribution to the community, the datasets can be very helpful.

Can you explain the meaning of "upsampled" in https://huggingface.co/datasets/TIGER-Lab/M-BEIR? How did you upsample the smaller datasets?

mbeir_union_up_train.jsonl: This file is the default training data for in-batch contrastive training specifically designed for UniIR models. It aggregates all the data from the train directory and datasets with relatively smaller sizes have been upsampled to balance the training process.

Raion-Shin avatar Aug 27 '24 12:08 Raion-Shin