FlagEmbedding
FlagEmbedding copied to clipboard
Training Time of Reranker
trafficstars
Thanks for the great work and the open-source models. BTW, I am quite interested in the following questions.
- Total time to train the LLMReranker, such as Gemma and MiniCPM, under what kind of hardware.
- Max length for the query/passage and batch size when training with LLM Reranker.
Many thanks!
We trained for 4 days on 8 * 40G A100 GPUs. During training, the total length of query plus passage was 1024, and the batch size was 128.
Thank you for you quick follow up. Sorry I have another question: How many epoch you trained on all m3+fever+quora data? Do you do any downsampling?
Training for 1-2 epochs is enough.
Thanks for the reply. Sorry I have more questions here.
- for long context example (e.g. length > 1k), do we decrease the batch size during the training? If it is, is this done automatically?
- During the training, is left padding used or right padding (for padding_side of tokenizer)
- It will truncate long contexts, so there is no need to decrease the batch size.
- Follow the tokenizer's raw padding side.