Shitao Xiao
Shitao Xiao
@sevenandseven , you need to add function `get_input_embeddings ` for BiEncoderModel class.
@sevenandseven , which reranker do you use?
You can reduce the `batch size` and `max_length` to reduce memory cost.
@sevenandseven , you can pass `batch size` and `max_length ` to `compute_score(batch size=?, max_length=?)` function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194
Based on the results, sparse retrieval might not be suitable for your data. You can choose the best way to use it.
We release the retromae version of bge-m3: https://huggingface.co/BAAI/bge-m3-retromae, and you can use it to do pretrain.
@Lahaina936 , the script of negative mining: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives can be used for both the embedding model and reranker model.
@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.
@chillizex , this error is from fastchat, so you can open a issue in fastcaht repo. We're sorry that we cannot address this issue.
The instruction has little influence. The most important thing is the data, so you should check the quality of your data.