CodeT5
CodeT5 copied to clipboard
fine tune codet5_large on concode gives an error
training on codet5_base works, how can I fix this?
CUDA_VISIBLE_DEVICES=0 python /home/aldo/CodeT5/run_gen.py --do_train --do_eval --do_eval_bleu --do_test --task concode --sub_task none --model_type codet5 --data_num 100 --num_train_epochs 1 --warmup_steps 10 --learning_rate 10e-5 --patience 3 --tokenizer_name=Salesforce/codet5-large --model_name_or_path=Salesforce/codet5-large --data_dir /home/aldo/CodeT5/data --cache_path saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data --output_dir saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1 --summary_dir tensorboard --save_last_checkpoints --always_save_model --res_dir saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/prediction --res_fn results/concode_codet5_base.txt --train_batch_size 8 --eval_batch_size 8 --max_source_length 320 --max_target_length 150 2>&1 | tee saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/train.log
02/16/2023 21:59:27 - INFO - __main__ - Namespace(adam_epsilon=1e-08, add_lang_ids=False, add_task_prefix=False, always_save_model=True, beam_size=10, cache_path='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data', config_name='', data_dir='/home/aldo/CodeT5/data', data_num=100, dev_filename=None, do_eval=True, do_eval_bleu=True, do_lower_case=False, do_test=True, do_train=True, eval_batch_size=8, eval_steps=-1, eval_task='', gradient_accumulation_steps=1, lang='java', learning_rate=0.0001, load_model_path=None, local_rank=-1, log_steps=-1, max_grad_norm=1.0, max_source_length=320, max_steps=-1, max_target_length=150, model_name_or_path='Salesforce/codet5-large', model_type='codet5', no_cuda=False, num_train_epochs=1, output_dir='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1', patience=3, res_dir='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/prediction', res_fn='results/concode_codet5_base.txt', save_last_checkpoints=True, save_steps=-1, seed=1234, start_epoch=0, sub_task='none', summary_dir='tensorboard', task='concode', test_filename=None, tokenizer_name='Salesforce/codet5-large', train_batch_size=8, train_filename=None, train_steps=-1, warmup_steps=10, weight_decay=0.0)
02/16/2023 21:59:27 - WARNING - configs - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, cpu count: 96
02/16/2023 21:59:40 - INFO - models - Finish loading model [738M] from Salesforce/codet5-large
02/16/2023 21:59:45 - INFO - utils - Read 100 examples, avg src len: 68, avg trg len: 26, max src len: 249, max trg len: 82
02/16/2023 21:59:45 - INFO - utils - [TOKENIZE] avg src len: 202, avg trg len: 33, max src len: 741, max trg len: 104
02/16/2023 21:59:45 - INFO - utils - Create cache data into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data/train_100.pt
100%|##########| 100/100 [00:02<00:00, 47.08it/s]
02/16/2023 21:59:47 - INFO - __main__ - ***** Running training *****
02/16/2023 21:59:47 - INFO - __main__ - Num examples = 100
02/16/2023 21:59:47 - INFO - __main__ - Batch size = 8
02/16/2023 21:59:47 - INFO - __main__ - Batch num = 13
[0] Train loss 11.713: 100%|##########| 13/13 [00:12<00:00, 1.05it/s]
02/16/2023 21:59:59 - INFO - utils - Read 100 examples, avg src len: 69, avg trg len: 31, max src len: 193, max trg len: 100
02/16/2023 21:59:59 - INFO - utils - Create cache data into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data/dev_100.pt
100%|##########| 100/100 [00:02<00:00, 48.75it/s]
02/16/2023 22:00:01 - INFO - __main__ - ***** Running ppl evaluation *****
02/16/2023 22:00:01 - INFO - __main__ - Num examples = 100
02/16/2023 22:00:01 - INFO - __main__ - Batch size = 8
Eval ppl: 100%|##########| 13/13 [00:03<00:00, 3.57it/s]
02/16/2023 22:00:05 - INFO - __main__ - epoch = 0
02/16/2023 22:00:05 - INFO - __main__ - eval_ppl = 26.63935
02/16/2023 22:00:05 - INFO - __main__ - global_step = 13
02/16/2023 22:00:05 - INFO - __main__ - ********************
02/16/2023 22:00:13 - INFO - __main__ - Save the last model into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/checkpoint-last/pytorch_model.bin
02/16/2023 22:00:13 - INFO - __main__ - Best ppl:26.63935
02/16/2023 22:00:13 - INFO - __main__ - ********************
02/16/2023 22:00:20 - INFO - __main__ - Save the best ppl model into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/checkpoint-best-ppl/pytorch_model.bin
02/16/2023 22:00:20 - INFO - __main__ - ***** CUDA.empty_cache() *****
02/16/2023 22:00:20 - INFO - utils - Read 100 examples, avg src len: 69, avg trg len: 31, max src len: 193, max trg len: 100
02/16/2023 22:00:20 - INFO - utils - Sample 5k data for computing bleu from /home/aldo/CodeT5/data/concode/dev.json
100%|##########| 100/100 [00:02<00:00, 48.45it/s]
02/16/2023 22:00:22 - INFO - __main__ - ***** Running bleu evaluation on dev data*****
02/16/2023 22:00:22 - INFO - __main__ - Num examples = 100
02/16/2023 22:00:22 - INFO - __main__ - Batch size = 8
Eval bleu for dev set: 100%|##########| 13/13 [02:01<00:00, 9.36s/it]
Traceback (most recent call last):
File "/home/aldo/CodeT5/run_gen.py", line 388, in <module>
main()
File "/home/aldo/CodeT5/run_gen.py", line 315, in main
result = eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, 'dev', 'e%d' % cur_epoch)
File "/home/aldo/CodeT5/run_gen.py", line 153, in eval_bleu_epoch
codebleu = calc_code_bleu.get_codebleu(gold_fn, output_fn, args.lang)
File "/home/aldo/CodeT5/evaluator/CodeBLEU/calc_code_bleu.py", line 21, in get_codebleu
assert len(hypothesis) == len(pre_references[i])
AssertionError