LoRA
LoRA copied to clipboard
Some questions regarding the label shift in model training and the evaluation hyperparameters for WebNLG
Hi,
I really enjoy the work you propose! In learning the paper and the code, I have a question regarding the implementation of GPT2LMModel's forward function (https://github.com/microsoft/LoRA/blob/aa68d8a021c7ba08973e35fdfdc76338fdbfad57/examples/NLG/src/model.py#L396). I notice the label and logit are not shifted like other GPT2 model does:
shift_logits = lm_logits[..., :-1, :].contiguous(); shift_labels = labels[..., 1:].contiguous()
May I ask whether the shift is necessary in your code, or in which part you have implemented the shift?
Besides, I also fail to obtain the expected result of LoRA on WebNLG (Table 14 in the paper, LoRA 0.35M) with the checkpoint provided in this repo. The script and the hyperparameters I use is
`python3 -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py
--data ./data/webnlg_challenge_2017/test.jsonl
--batch_size 1
--seq_len 512
--eval_len 64
--model_card gpt2.md
--init_checkpoint ./trained_models/GPT2_M/webnlg/gpt2_md_lora_webnlg.pt
--platform local
--lora_dim 4
--lora_alpha 32
--beam 10
--length_penalty 0.8
--no_repeat_ngram_size 4
--repetition_penalty 1.0
--eos_token_id 628
--work_dir ./trained_models/GPT2_M/webnlg
--output_file predict.lora.md.jsonl
python3 src/gpt2_decode.py
--vocab ./vocab
--sample_file ./trained_models/GPT2_M/webnlg/predict.lora.md.jsonl
--input_file ./data/webnlg_challenge_2017/test_formatted.jsonl
--ref_type webnlg
--ref_num 6
--output_ref_file eval/GenerationEval/data/references_webnlg
--output_pred_file eval/GenerationEval/data/hypothesis_webnlg
--tokenize --lower`
Does the hyperparameters I use seem right? The final metric I got is
BLEU Seen: 59.66 BLEU Unseen: 45.47 BLEU All: 53.27 METEOR Seen: 0.43 METEOR Unseen: 0.38 METEOR All: 0.41 TER Seen: 0.40 TER Unseen: 0.52 TER All: 0.45
(I modify gpt2_beam.py a little bit (see below), to first load the parameters from "./pretrained_checkpoints/gpt2-medium-pytorch_model.bin", and then from "gpt2_md_lora_webnlg.pt", the checkpoint provided. Is the modification sensible, or how would you recommend to load the model?
original: https://github.com/microsoft/LoRA/blob/aa68d8a021c7ba08973e35fdfdc76338fdbfad57/examples/NLG/src/gpt2_beam.py#L381
new: ` lm_net = GPT2LMModel(config)
cp = torch.load("./pretrained_checkpoints/gpt2-medium-pytorch_model.bin", map_location=torch.device('cpu'))
lm_net.load_weight(cp)
if args.init_checkpoint is not None:
print('loading model pretrained weight.')
cp = torch.load(args.init_checkpoint, map_location=torch.device('cpu'))
lm_net.load_weight(cp)
lm_net = lm_net.cuda()`
)