llm2vec
llm2vec copied to clipboard
Evaluating MNTP task only
Thanks for your work.
I am wondering if you plan to provide an example script for loading and evaluating the MNTP task only. In my experiments, the results are _attn_implementation
dependent...
In my experiments, the results are _attn_implementation dependent...
I am not sure what you mean by this. Can you elaborate?
We do not evaluate MNTP task separately as it is not used for either word level or sentence level tasks. In-training evaluation takes place when you train for MNTP.
However it is easy to do MNTP eval with two changes
- In the config file, change
do_train
fromtrue
tofalse
. - In
experiments/run_mntp.py
, change Line 700-705 to
from peft import PeftModel
model.model = PeftModel.from_pretrained(
model.model,
<HF MNTP MODEL ID>,
)
After these two changes, when you run
python experiments/run_mntp.py train_configs/mntp/Mistral.json
The script will give MNTP evaluation results. For Mistral, I got the following scores
***** eval metrics *****
eval_accuracy = 0.2474
eval_loss = 4.6248
eval_runtime = 0:00:11.83
eval_samples = 568
eval_samples_per_second = 47.991
eval_steps_per_second = 1.521
perplexity = 101.9872
Feel free to re-open if you have any more questions about this issue.