Shitao Xiao comments

Results 509 comments of


                                            Shitao Xiao

添加噪声微调模型

@sevenandseven , you need to add function `get_input_embeddings ` for BiEncoderModel class.

cuda增加，直到溢出报错

@sevenandseven , which reranker do you use?

cuda增加，直到溢出报错

You can reduce the `batch size` and `max_length` to reduce memory cost.

@sevenandseven , you can pass `batch size` and `max_length ` to `compute_score(batch size=?, max_length=?)` function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

关于训练结果ACC指标下降的问题

Based on the results, sparse retrieval might not be suitable for your data. You can choose the best way to use it.

BGE-M3预训练问题

We release the retromae version of bge-m3: https://huggingface.co/BAAI/bge-m3-retromae, and you can use it to do pretrain.

How to mine hard neg for reranker?

@Lahaina936 , the script of negative mining: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives can be used for both the embedding model and reranker model.

负样本挖掘时的选择范围

@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.

Cannot use /v1/embeddings api with BGE-M3 model

@chillizex , this error is from fastchat, so you can open a issue in fastcaht repo. We're sorry that we cannot address this issue.

question about finetuning

The instruction has little influence. The most important thing is the data, so you should check the quality of your data.