Liang Wang
Liang Wang
The v2 models are pre-trained on larger text pair datasets, the network architecture and training recipes are the same.
Yes, currently it only works for English. We'll release multilingual versions of text embeddings in the coming month (no guarantee about the timeline though), please stay tuned! Thanks, Liang
* For BM25, we sample 200 negatives from the top-1000 retrieved passages. * For Retriever 2, we use top-200 samples from retriever 1 as hard negatives. * For Reranker, again,...
If you want to use the released model without fine-tuning, then you should add "query: " and "passage: " prefix, otherwise it is optional.
> one last question, is there any plan to release also the cross encoder? @intfloat The cross encoder for ms-marco is available at [https://github.com/microsoft/unilm/tree/master/simlm#available-models](https://github.com/microsoft/unilm/tree/master/simlm#available-models), we do not plan to release...
Can you try adding `--label_names labels` to the launch command in `simlm/scripts/train_biencoder_marco.sh`? Our code base is tested with `transformers==4.15`, newer versions seem to have breaking changes for us. UPDATE: this...
You can refer to the answer here: https://github.com/intfloat/SimKGC/issues/10#issuecomment-1296792284 The InfoNCE loss is basically a cross entropy loss but the labels are not pre-defined like in text classification.
当然可以,把第二行`loss += ...`注释掉就行,但效果会下降一点
> > 当然可以,把第二行`loss += ...`注释掉就行,但效果会下降一点 > > 请问我在训练过程中将第二行 Loss+=...注释之后,在测试过程中也需要注释backward_metrics = .....吗 不需要,forward metrics 对应 给定头实体和关系,预测尾实体,backward metrics 对应 给定尾实体和关系,预测头实体。对大多数数据集来说,第一种预测任务更简单,所以metrics会好一些。
I am not entirely clear about your question, but I guess you are asking whether the groundtruth tail entity will be leaked in the input during test stage? The link...