Results 58 comments of Liang Wang
trafficstars

For MSMARCO, we use the same data from https://github.com/microsoft/unilm/tree/master/simlm#download-our-pre-processed-data For NQ, we did not release the data, but you can use public versions with mined hard negatives.

We use the questions and positive passages from DPR.

@theejung Thanks for the questions. 1. Yes, we use hard negatives to train cross encoder, but the hard negatives are contradictory sentences instead of BM25 mined. This strategy is borrowed...

Thanks for your interest in our work! Currently we have no plan to release the CCPairs data, but I'll let you know if things change. The paper includes most dataset...

Sorry but I was unable to reproduce your issue. Following instructions at https://github.com/microsoft/unilm/tree/master/simlm , I execute the following command without changing any code: ``` bash scripts/download_msmarco_data.sh export DATA_DIR=./data/msmarco_bm25_official/ export OUTPUT_DIR=./checkpoint/biencoder/...

@abhishekverma1997 The `positives` are human-annotated relevant passages, and the `negatives` are the BM25 retrieved passages that are not annotated as relevant. Only the scores at `simlm/data/msmarco_distillation/kd_train.jsonl` are used for knowledge...

For your questions: * Do both models have the same two-stage training? Yes, the techniques are the same, but the data is different. The first stage is contrastive pre-training, and...

We'll release multilingual-e5-large checkpoint, but it will take some time, perhaps weeks.

Sure, the common techniques are: 1. Use gradient checkpointing, it saves at least half of GPU memory while being ~30% slower. 2. Use [DeepSpeed](https://github.com/microsoft/DeepSpeed) launcher if possible, its ZeRO stage...

If you are using the Trainer from HuggingFace transformers library, [gradient checkpointing](https://huggingface.co/docs/transformers/v4.30.0/en/perf_train_gpu_one#gradient-checkpointing) is enabled by simply passing an argument `--gradient_checkpointing True`. Shorter inputs mean less activation values to store in...