[P1] For left_padding in compute_metrics.py
When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match. This is because the tokenizer. bos_token_id=0 of llama-7b has multiple positions of 0 in input.
If we use the following formula in the project: left_ adding=(inputs ["input_ids"]=tokenizer. bos_token_id). nonzero (as_tuple=True) [1], then the size of left_ adding is (N), where N is the number of inputs ["input_ids"] that are 0, rather than the desired size: (batch_size).
Therefore, I have changed it to the following code:
Mask=(inputs ["input_ids"]==tokenizer. bos_token_id)
Indications=torch. top (mask. int()), k=1, dim=-1).indices
Left_pdding=torch. where (mask. any (dim=-1), indices. reshape (mask. shape [: -1]), -1)
I hope the author can verify whether it is due to other issues that caused my error or if I understand the reason; Is the revised code correct.
The command I use is
CUDA_VISIBLE_DEVICES=6 python examples/loreft/train.py -task gsm8k -model models/Llama/Llama/llama-7b-hf/ -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 8 -eval_batch_size 4 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06 --save_model
@mrsempress Thanks for your question. Could you elaborate this?
When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match.
intervention_locations is determined by -p f7+l7 (first 7 and last 7 prompt tokens), which does not need to match the size of left_padding IIUC.