LoRA icon indicating copy to clipboard operation
LoRA copied to clipboard

Some questions regarding the label shift in model training and the evaluation hyperparameters for WebNLG

Open Freefighter opened this issue 2 years ago • 0 comments

Hi,

I really enjoy the work you propose! In learning the paper and the code, I have a question regarding the implementation of GPT2LMModel's forward function (https://github.com/microsoft/LoRA/blob/aa68d8a021c7ba08973e35fdfdc76338fdbfad57/examples/NLG/src/model.py#L396). I notice the label and logit are not shifted like other GPT2 model does:

shift_logits = lm_logits[..., :-1, :].contiguous(); shift_labels = labels[..., 1:].contiguous()

May I ask whether the shift is necessary in your code, or in which part you have implemented the shift?

Besides, I also fail to obtain the expected result of LoRA on WebNLG (Table 14 in the paper, LoRA 0.35M) with the checkpoint provided in this repo. The script and the hyperparameters I use is

`python3 -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py
--data ./data/webnlg_challenge_2017/test.jsonl
--batch_size 1
--seq_len 512
--eval_len 64
--model_card gpt2.md
--init_checkpoint ./trained_models/GPT2_M/webnlg/gpt2_md_lora_webnlg.pt
--platform local
--lora_dim 4
--lora_alpha 32
--beam 10
--length_penalty 0.8
--no_repeat_ngram_size 4
--repetition_penalty 1.0
--eos_token_id 628
--work_dir ./trained_models/GPT2_M/webnlg
--output_file predict.lora.md.jsonl

python3 src/gpt2_decode.py
--vocab ./vocab
--sample_file ./trained_models/GPT2_M/webnlg/predict.lora.md.jsonl
--input_file ./data/webnlg_challenge_2017/test_formatted.jsonl
--ref_type webnlg
--ref_num 6
--output_ref_file eval/GenerationEval/data/references_webnlg
--output_pred_file eval/GenerationEval/data/hypothesis_webnlg
--tokenize --lower`

Does the hyperparameters I use seem right? The final metric I got is

BLEU Seen: 59.66 BLEU Unseen: 45.47 BLEU All: 53.27 METEOR Seen: 0.43 METEOR Unseen: 0.38 METEOR All: 0.41 TER Seen: 0.40 TER Unseen: 0.52 TER All: 0.45

(I modify gpt2_beam.py a little bit (see below), to first load the parameters from "./pretrained_checkpoints/gpt2-medium-pytorch_model.bin", and then from "gpt2_md_lora_webnlg.pt", the checkpoint provided. Is the modification sensible, or how would you recommend to load the model?

original: https://github.com/microsoft/LoRA/blob/aa68d8a021c7ba08973e35fdfdc76338fdbfad57/examples/NLG/src/gpt2_beam.py#L381

new: ` lm_net = GPT2LMModel(config)

cp = torch.load("./pretrained_checkpoints/gpt2-medium-pytorch_model.bin", map_location=torch.device('cpu'))
lm_net.load_weight(cp)

if args.init_checkpoint is not None:
    print('loading model pretrained weight.')
    cp = torch.load(args.init_checkpoint, map_location=torch.device('cpu'))
    lm_net.load_weight(cp)
lm_net = lm_net.cuda()`

)

Freefighter avatar Nov 17 '21 05:11 Freefighter