Liang Wang comments

Results 58 comments of


                                            Liang Wang

trafficstars

E5 model finetuning code

For MSMARCO, we use the same data from https://github.com/microsoft/unilm/tree/master/simlm#download-our-pre-processed-data For NQ, we did not release the data, but you can use public versions with mined hard negatives.

E5 model finetuning code

We use the questions and positive passages from DPR.

E5 model finetuning code

@theejung Thanks for the questions. 1. Yes, we use hard negatives to train cross encoder, but the hard negatives are contradictory sentences instead of BM25 mined. This strategy is borrowed...

E5 CCPairs data

Thanks for your interest in our work! Currently we have no plan to release the CCPairs data, but I'll let you know if things change. The paper includes most dataset...

Error while fine-tuning the E5 model?

Sorry but I was unable to reproduce your issue. Following instructions at https://github.com/microsoft/unilm/tree/master/simlm , I execute the following command without changing any code: ``` bash scripts/download_msmarco_data.sh export DATA_DIR=./data/msmarco_bm25_official/ export OUTPUT_DIR=./checkpoint/biencoder/...

Error while fine-tuning the E5 model?

@abhishekverma1997 The `positives` are human-annotated relevant passages, and the `negatives` are the BM25 retrieved passages that are not annotated as relevant. Only the scores at `simlm/data/msmarco_distillation/kd_train.jsonl` are used for knowledge...

The difference between multilingual-e5-base and e5-base

For your questions: * Do both models have the same two-stage training? Yes, the techniques are the same, but the data is different. The first stage is contrastive pre-training, and...

The difference between multilingual-e5-base and e5-base

We'll release multilingual-e5-large checkpoint, but it will take some time, perhaps weeks.

Large batch size when pretraining E5 models

Sure, the common techniques are: 1. Use gradient checkpointing, it saves at least half of GPU memory while being ~30% slower. 2. Use [DeepSpeed](https://github.com/microsoft/DeepSpeed) launcher if possible, its ZeRO stage...

Large batch size when pretraining E5 models

If you are using the Trainer from HuggingFace transformers library, [gradient checkpointing](https://huggingface.co/docs/transformers/v4.30.0/en/perf_train_gpu_one#gradient-checkpointing) is enabled by simply passing an argument `--gradient_checkpointing True`. Shorter inputs mean less activation values to store in...