Shitao Xiao

Results 509 comments of Shitao Xiao

Hi, thanks for your interest in our work! We use the entire corpus to do pretrain and fine-tune.

> Based on the number of samples I think CMedQAv2 is the same data as CmedqaRetrieval. Maybe @staoxiao can confirm Yes, CmedqaRetrieval is v2.

Thanks for your interest in our work! The code for RetroMAE-2 has been released, and you can use it in https://github.com/staoxiao/RetroMAE/tree/master/examples/pretrain#pre-train (`dupmae`)

Hi, thanks for your interest in our work! We use the _whole_word_mask function to mask tokens, which will not mask the CLS token. You can refer to https://github.com/huggingface/transformers/blob/v4.34.1/src/transformers/data/data_collator.py#L845.

> > Hi, thanks for your interest in our work! We use the _whole_word_mask function to mask tokens, which will not mask the CLS token. You can refer to https://github.com/huggingface/transformers/blob/v4.34.1/src/transformers/data/data_collator.py#L845....

> Actually, while you're here, a question about DupMAE. It looks like the modeling_duplex file you included doesn't have the actual pooling operation you describe in the paper - how...

> Thank you - the rest i can get from the paper. Is there a reason you didn’t train BGE with DupMAE instead of RetroMAE? > > > > Actually,...

Hi, thanks for your interest in our work! Actually, we didn't test the mlm accuracy of retromae on any data. We view the retrieval performance after fine-tuning as the quality...

The version of cuda does not affect the training, it may be the transformers version that causes the problem. You can try to reduce the version to 4.18, or use...

Thanks for your interest in our work! We use 8*GPUs to fine-tune the model, which can increase the number of in-batch negatives since we share the negatives across GPUs. Using...