Oscar
Oscar copied to clipboard
training_args.bin not included in downloaded datasets base-vg-labels nor large-vg-labels
I am trying to run this project for COCO Captioning.
I downloaded the pretrained base and large vg-models as instructed in the DOWNLOAD.
These were the respective folders:
+-- base-vg-labels
| +-- ep_67_588997
| +-- ep_107_1192087
+-- large-vg-labels
| +-- ep_7_816000
| +-- ep_20_590000
| +-- ep_34_999600
| +-- ep_55_1617000
I tried to get the performance of those checkpoints, but after executing:
python oscar/run_captioning.py \
--do_test \
--do_eval \
--data_dir ../Data/coco_caption \
--test_yaml test.yaml \
--per_gpu_eval_batch_size 64 \
--max_gen_length 20 \
--num_beams 5 \
--eval_model_dir ../Models/base-vg-labels/ep_107_1192087
an error ocurred ponting out that the file training_args.bin was not found inside the model's directory (base-vg-labels/ep_107_1192087).
I also downloaded the Checkpoint available in the MODEL_ZOO, under Image Captioning on COCO. This Checkpoint corresponds to checkpoint-29-66420, which include a file training_args.bin.
These are the files included in each folder:
checkpoint-29-66420 | large-vg-labels/ep_55_1617000 |
---|---|
added_tokens.json |
added_tokens.json |
config.json |
config.json |
pytorch_model.bin |
pytorch_model.bin |
special_tokens_map.json |
special_tokens_map.json |
training_args.bin |
??? |
vocab.txt |
vocab.txt |
It seems that the only one missing is training_args.bin
. After finetunning the provided checkpoint, the generated checkpoints also include that file. Maybe you missed to include them in the downloadable models?
Could you please provide those files?
Or am I missing something?
I also noted that the Checkpoint/checkpoint-29-66420 corresponds to training base-vg-labels with cross-entropy loss (deducted from the provided training logs). So I assume its training_args.bin file is probably used across the entire base-vg-labels training. I am now copying the missing file into base-vg-labels/ep_107_1192087 to test its performance. Does that make sense?
Edit:
The performance of base-vg-labels/ep_107_1192087 with the args.bin borrowed from checkpoint-29-66420 was a failure.
{'SPICE': 0.00043991859734872146,
'Bleu_1': 4.759355107382894e-05,
'Bleu_2': 7.759555665291071e-13,
'Bleu_3': 2.0103762808759086e-15,
'Bleu_4': 1.0403605447565029e-16,
'ROUGE_L': 6.248437014771456e-05,
'CIDEr': 1.3248802263844757e-06}
Have found a solution to this problem or have idea how to get around it?
I checked again, the provided Models do not yet contain the file training_args.bin
.
These are the contents of the base-vg-labels and large-vg-labels zip files after decompressing:
!unzip new/base-vg-labels.zip -d new
Archive: new/base-vg-labels.zip
creating: new/base-vg-labels/
creating: new/base-vg-labels/ep_67_588997/
inflating: new/base-vg-labels/ep_67_588997/vocab.txt
inflating: new/base-vg-labels/ep_67_588997/special_tokens_map.json
inflating: new/base-vg-labels/ep_67_588997/config.json
extracting: new/base-vg-labels/ep_67_588997/added_tokens.json
inflating: new/base-vg-labels/ep_67_588997/pytorch_model.bin
creating: new/base-vg-labels/ep_107_1192087/
inflating: new/base-vg-labels/ep_107_1192087/vocab.txt
inflating: new/base-vg-labels/ep_107_1192087/special_tokens_map.json
inflating: new/base-vg-labels/ep_107_1192087/config.json
extracting: new/base-vg-labels/ep_107_1192087/added_tokens.json
inflating: new/base-vg-labels/ep_107_1192087/pytorch_model.bin
!unzip new/large-vg-labels.zip -d new
Archive: new/large-vg-labels.zip
creating: new/large-vg-labels/
creating: new/large-vg-labels/ep_20_590000/
extracting: new/large-vg-labels/ep_20_590000/added_tokens.json
inflating: new/large-vg-labels/ep_20_590000/config.json
inflating: new/large-vg-labels/ep_20_590000/pytorch_model.bin
inflating: new/large-vg-labels/ep_20_590000/special_tokens_map.json
inflating: new/large-vg-labels/ep_20_590000/vocab.txt
creating: new/large-vg-labels/ep_34_999600/
extracting: new/large-vg-labels/ep_34_999600/added_tokens.json
inflating: new/large-vg-labels/ep_34_999600/config.json
inflating: new/large-vg-labels/ep_34_999600/pytorch_model.bin
inflating: new/large-vg-labels/ep_34_999600/special_tokens_map.json
inflating: new/large-vg-labels/ep_34_999600/vocab.txt
creating: new/large-vg-labels/ep_55_1617000/
extracting: new/large-vg-labels/ep_55_1617000/added_tokens.json
inflating: new/large-vg-labels/ep_55_1617000/config.json
inflating: new/large-vg-labels/ep_55_1617000/pytorch_model.bin
inflating: new/large-vg-labels/ep_55_1617000/special_tokens_map.json
inflating: new/large-vg-labels/ep_55_1617000/vocab.txt
creating: new/large-vg-labels/ep_7_816000/
extracting: new/large-vg-labels/ep_7_816000/added_tokens.json
inflating: new/large-vg-labels/ep_7_816000/config.json
inflating: new/large-vg-labels/ep_7_816000/log.txt
inflating: new/large-vg-labels/ep_7_816000/pytorch_model.bin
inflating: new/large-vg-labels/ep_7_816000/special_tokens_map.json
inflating: new/large-vg-labels/ep_7_816000/vocab.txt
Have found a solution to this problem or have idea how to get around it?
I am still working with checkpoint-29-66420.
Have found a solution to this problem or have idea how to get around it?
I did the following to see the contents of the file:
import torch
training_args = torch.load('checkpoint-29-66420/training_args.bin')
type(training_args) # => argparse.Namespace
And this is how they look like:
{'adam_epsilon': 1e-08,
'add_od_labels': True,
'config_name': '',
'data_dir': 'datasets/coco_caption_release/',
'device': device(type='cuda'),
'do_eval': False,
'do_lower_case': True,
'do_test': False,
'do_train': True,
'drop_out': 0.1,
'eval_model_dir': '',
'evaluate_during_training': False,
'gradient_accumulation_steps': 1,
'img_feature_dim': 2054,
'img_feature_type': 'frcnn',
'learning_rate': 3e-05,
'length_penalty': 1,
'logging_steps': 20,
'loss_type': 'sfmx',
'mask_prob': 0.15,
'max_gen_length': 20,
'max_grad_norm': 1.0,
'max_img_seq_length': 50,
'max_masked_tokens': 3,
'max_seq_a_length': 40,
'max_seq_length': 70,
'max_steps': -1,
'min_constraints_to_satisfy': 2,
'model_name_or_path': 'models/captioning/base-vg-labels/ep_67_588997/',
'n_gpu': 4,
'no_cuda': False,
'num_beams': 5,
'num_keep_best': 1,
'num_labels': 2,
'num_return_sequences': 1,
'num_train_epochs': 30,
'num_workers': 4,
'output_dir': 'output/',
'output_hidden_states': False,
'output_mode': 'classification',
'per_gpu_eval_batch_size': 64,
'per_gpu_train_batch_size': 64,
'repetition_penalty': 1,
'save_steps': -1,
'scheduler': 'linear',
'scst': False,
'seed': 88,
'temperature': 1,
'test_yaml': 'test.yaml',
'tokenizer_name': '',
'top_k': 0,
'top_p': 1,
'train_batch_size': 256,
'train_yaml': 'train.yaml',
'use_cbs': False,
'val_yaml': 'val.yaml',
'warmup_steps': 0,
'weight_decay': 0.05}
The file can be saved using:
torch.save(train_args, 'new/path/to/train_args.bin')
So a path to follow would be to change the contents of this file to our needs (or guess).
Please share a convenient configuration for this file if you find any.
Note: Inferring from the arguments, it seems like checkpoint-29-66420 is the result of training 30 epochs (starting at 0) on coco_captions_release dataset, starting from the base-vg-labels/ep_67_588997 model.
Hi @EByrdS , I also find this very confusing, and I am still not sure about the correct configuration. However, I tried to train one of the downloadable models (base-vg-labels/ep_67_588997) with cross-entropy loss for 1 epoch:
python oscar/run_captioning.py \
--model_name_or_path pretrained_models/base-vg-labels/ep_67_588997 \
--do_train \
--do_lower_case \
--evaluate_during_training \
--add_od_labels \
--learning_rate 0.00001 \
--per_gpu_train_batch_size 32 \
--num_train_epochs 1 \
--save_steps 5000 \
--output_dir output/
This generated a new checkpoint (in my case named "checkpoint-0-17711") in the output_dir, which now indeed contains the training_args.bin. After testing and evaluation with this new checkpoint
python oscar/run_captioning.py \
--do_test \
--do_eval \
--test_yaml test.yaml \
--eval_model_dir output/checkpoint-0-17711
I obtained good scores:
SPICE: 0.2186335111075091 Bleu_1: 0.7795251310514796 Bleu_2: 0.6217080261420003 Bleu_3: 0.4746714027013575 Bleu_4: 0.3557547908042079 ROUGE_L: 0.5741837218547888 CIDEr: 1.200443530840875
Hope this answer was helpful!
Hi @EByrdS , I also find this very confusing, and I am still not sure about the correct configuration. However, I tried to train one of the downloadable models (base-vg-labels/ep_67_588997) with cross-entropy loss for 1 epoch:
python oscar/run_captioning.py \ --model_name_or_path pretrained_models/base-vg-labels/ep_67_588997 \ --do_train \ --do_lower_case \ --evaluate_during_training \ --add_od_labels \ --learning_rate 0.00001 \ --per_gpu_train_batch_size 32 \ --num_train_epochs 1 \ --save_steps 5000 \ --output_dir output/
This generated a new checkpoint (in my case named "checkpoint-0-17711") in the output_dir, which now indeed contains the training_args.bin. After testing and evaluation with this new checkpoint
python oscar/run_captioning.py \ --do_test \ --do_eval \ --test_yaml test.yaml \ --eval_model_dir output/checkpoint-0-17711
I obtained good scores:
SPICE: 0.2186335111075091 Bleu_1: 0.7795251310514796 Bleu_2: 0.6217080261420003 Bleu_3: 0.4746714027013575 Bleu_4: 0.3557547908042079 ROUGE_L: 0.5741837218547888 CIDEr: 1.200443530840875
Hope this answer was helpful!
Do you mind sharing your trained model? I was running it on google colab but there's not enough memory to finish the training. Would really appreciate it. Thanks!
Do you mind sharing your trained model? I was running it on google colab but there's not enough memory to finish the training. Would really appreciate it. Thanks!
Hi lisaliu1997,
I don't have that particular model on my hard drive anymore. I refer you to the VinVL_DOWNLOAD page, where you can download pretrained models with the training_args.bin.