SeqTR icon indicating copy to clipboard operation
SeqTR copied to clipboard

Errors in finetuning

Open pqviet opened this issue 3 years ago • 7 comments

After completing pre-training, I finetuned to refcoco-unc and found the following error messages File "SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict'] KeyError: 'ema_state_dict' Even after fixing this bug, I still found many bugs (e.g. lan_enc.embedding.weight, model.head) in load_pretrained_checkpoint(). Can you please check it?

pqviet avatar Jul 14 '22 07:07 pqviet

Hi, please upload the full traceback. Did it show the lan_enc.embedding.weight does not match the size, pre-training uses a larger word vocabulary, while fine-tuning only need a subset of this vocabulary, since we freeze the embedding weight both for pre-training and fine-tuning, it's ok, don't worry.

seanzhuh avatar Jul 15 '22 14:07 seanzhuh

After fixing the 'ema_state_dict' keyerror, I got the same error for lan_enc.embedding.weight KeyError: 'lan_enc.embedding.weight' I think some keys in the fine-tuned model were not defined in the pretrain model.

pqviet avatar Jul 19 '22 07:07 pqviet

Did you use DDP during fine-tuning, if that's the case, the keys in pre-trained state_dict need to prepend "module." since we move it in line 58-59. By default we fine-tune on a single GPU card.

seanzhuh avatar Jul 19 '22 12:07 seanzhuh

No, I didn't use DDP in fine-tuning python tools/train.py configs/seqtr/detection/seqtr_det_refcoco-unc.py --finetune-from work_dir/seqtr_det_mixed/det_best.pth --cfg-options scheduler_config.max_epoch=5 scheduler_config.decay_steps=[4] scheduler_config.warmup_epochs=0

pqviet avatar Jul 20 '22 01:07 pqviet

Dear Author: I met the same error. The trackback is attached:

Traceback (most recent call last): File "tools/train.py", line 183, in main() File "tools/train.py", line 179, in main main_worker(cfg) File "tools/train.py", line 105, in main_worker load_pretrained_checkpoint(model, model_ema, cfg.finetune_from, amp=cfg.use_fp16) File "/home/chch3470/SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict'] KeyError: 'ema_state_dict'

I am finetuning the segmentation model from the "pre-trained + fine-tuned SeqTR segmentation" on a customized dataset.

(1) I can run inference/test on this pretrained model. (2)I also can fine-tune the detection model.

Not sure if there is something missing from the segmentation finetune...Could you kindly guide me? Thank you so much!

The script I run is ``python tools/train.py configs/seqtr/segmentation/seqtr_segm_vizwiz.py --finetune-from "/home/chch3470/SeqTR/work_dir/segm_best.pth" --cfg-options scheduler_config.max_epoch=10 scheduler_config.decay_steps=[4] scheduler_config.warmup_epochs=0 "

CCYChongyanChen avatar Nov 03 '22 07:11 CCYChongyanChen

During pretraining, we disable EMA and LSJ, so there is no ema_state_dict of the model. Just comment this line and loading the state_dict would be fine.


发件人: 陈崇彦 @.> 发送时间: Thursday, November 3, 2022 3:52:20 PM 收件人: sean-zhuh/SeqTR @.> 抄送: seanZhuh @.>; Comment @.> 主题: Re: [sean-zhuh/SeqTR] Errors in finetuning (Issue #6)

Dear Author: I met the same error. The trackback is attached:

Traceback (most recent call last): File "tools/train.py", line 183, in main() File "tools/train.py", line 179, in main main_worker(cfg) File "tools/train.py", line 105, in main_worker load_pretrained_checkpoint(model, model_ema, cfg.finetune_from, amp=cfg.use_fp16) File "/home/chch3470/SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict'] KeyError: 'ema_state_dict'

I am finetuning the segmentation model from the "pre-trained + fine-tuned SeqTR segmentation" on a customized dataset.

(1) I can run inference/test on this pretrained model. (2)I also can fine-tune the detection model.

Not sure if there is something missing from the segmentation finetune...Could you kindly guide me? Thank you so much!

― Reply to this email directly, view it on GitHubhttps://github.com/sean-zhuh/SeqTR/issues/6#issuecomment-1301747156, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG6FBLMHWSLPNZ2UE6O57KLWGNVLJANCNFSM53RHB3ZQ. You are receiving this because you commented.Message ID: @.***>

seanzhuh avatar Nov 03 '22 10:11 seanzhuh

Thank you for quick reply! I comment out the lines about ema and it shows error about lan_enc.embedding.weight

Traceback (most recent call last): File "tools/train.py", line 183, in main() File "tools/train.py", line 179, in main main_worker(cfg) File "tools/train.py", line 105, in main_worker load_pretrained_checkpoint(model, model_ema, cfg.finetune_from, amp=cfg.use_fp16) File "/home/chch3470/SeqTR/seqtr/utils/checkpoint.py", line 61, in load_pretrained_checkpoint state.pop("lan_enc.embedding.weight") KeyError: 'lan_enc.embedding.weight'

The seq_embedding_dim key is also missing. I commented out many lines and it seems to be working. Though not sure if I did it correctly or not image

CCYChongyanChen avatar Nov 03 '22 18:11 CCYChongyanChen