unilm
unilm copied to clipboard
The project of E5: How to fine-tune E5 model on NLI task?
Describe Model I am using is E5:
I have several questions regarding fine-tuning the E5 model on the NLI task.
- Should I add
passage:to the premise andquery:to the hypothesis? (as it's an asymmetric task) or the other way around? or maybe just addquery:as the second token (after<s>)? (regardless of the position of the premise/hypothesis).
Currently I'm fine-tuning with the following format:
<s> passage: premise </s><s> query: hypothesis </s>
Would be happy to know if it is the correct way to do it.
-
Do the training scripts of E5 are publicly open? couldn't find them.
-
When fine-tuning on different tasks, did you just stack a proper head on top of the current pooler? The pooler I referred to: (pooler): XLMRobertaPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() )
If so, where can I find the different heads' weights for the different fine-tuned tasks? I guess they are not very important but may be helpful.
Thanks in advance!
@intfloat Hi :) can you assist please?
Hi @MatanAvitan ,
Thanks for the questions.
- Should I add
passage:to the premise andquery:to the hypothesis?
Although NLI is technically an asymmetric task, we follow the SimCSE paper and treat it as a symmetric task. During the training, 50% of the time we add passage: to the premise and query: to the hypothesis, and 50% for the other way around.
- Do the training scripts of E5 are publicly open?
Unfortunately, the training scripts are not publicly available. Our code is based on https://github.com/microsoft/unilm/tree/master/simlm with some changes to support custom prefixes. The released E5 checkpoints are supposed to be good embedding models without any further training. If you would like to fine-tune them, you can use existing libraries such as Tevatron by changing the initialization.
- When fine-tuning on different tasks, did you just stack a proper head on top of the current pooler?
I am not sure about your question. We do not fine-tune it on different tasks. Instead, we pre-train and fine-tune it on a mixture of data jointly, and then evaluate it on different tasks without any further fine-tuning.
Thank you for your questions and contributions @MatanAvitan.
I would like to fine-tune the e5-multilingual-base or e5-multilingual-large embedding model. Could you share the code you used with me, or do you have any recommendations?