Oscar icon indicating copy to clipboard operation
Oscar copied to clipboard

training_args.bin not included in downloaded datasets base-vg-labels nor large-vg-labels

Open EByrdS opened this issue 4 years ago • 6 comments

I am trying to run this project for COCO Captioning.

I downloaded the pretrained base and large vg-models as instructed in the DOWNLOAD.

These were the respective folders:

+-- base-vg-labels
| +-- ep_67_588997
| +-- ep_107_1192087
+-- large-vg-labels
| +-- ep_7_816000
| +-- ep_20_590000
| +-- ep_34_999600
| +-- ep_55_1617000

I tried to get the performance of those checkpoints, but after executing:

python oscar/run_captioning.py \
  --do_test \
  --do_eval \
  --data_dir ../Data/coco_caption \
  --test_yaml test.yaml \
  --per_gpu_eval_batch_size 64 \
  --max_gen_length 20 \
  --num_beams 5 \
  --eval_model_dir ../Models/base-vg-labels/ep_107_1192087

an error ocurred ponting out that the file training_args.bin was not found inside the model's directory (base-vg-labels/ep_107_1192087).

I also downloaded the Checkpoint available in the MODEL_ZOO, under Image Captioning on COCO. This Checkpoint corresponds to checkpoint-29-66420, which include a file training_args.bin.

These are the files included in each folder:

checkpoint-29-66420 large-vg-labels/ep_55_1617000
added_tokens.json added_tokens.json
config.json config.json
pytorch_model.bin pytorch_model.bin
special_tokens_map.json special_tokens_map.json
training_args.bin ???
vocab.txt vocab.txt

It seems that the only one missing is training_args.bin. After finetunning the provided checkpoint, the generated checkpoints also include that file. Maybe you missed to include them in the downloadable models?

Could you please provide those files?

Or am I missing something?

I also noted that the Checkpoint/checkpoint-29-66420 corresponds to training base-vg-labels with cross-entropy loss (deducted from the provided training logs). So I assume its training_args.bin file is probably used across the entire base-vg-labels training. I am now copying the missing file into base-vg-labels/ep_107_1192087 to test its performance. Does that make sense?

Edit:

The performance of base-vg-labels/ep_107_1192087 with the args.bin borrowed from checkpoint-29-66420 was a failure.


 {'SPICE': 0.00043991859734872146, 
  'Bleu_1': 4.759355107382894e-05, 
  'Bleu_2': 7.759555665291071e-13, 
  'Bleu_3': 2.0103762808759086e-15, 
  'Bleu_4': 1.0403605447565029e-16, 
  'ROUGE_L': 6.248437014771456e-05, 
  'CIDEr': 1.3248802263844757e-06}

EByrdS avatar Nov 08 '20 00:11 EByrdS

Have found a solution to this problem or have idea how to get around it?

chodi150 avatar Dec 01 '20 22:12 chodi150

I checked again, the provided Models do not yet contain the file training_args.bin.

These are the contents of the base-vg-labels and large-vg-labels zip files after decompressing:

!unzip new/base-vg-labels.zip -d new

Archive:  new/base-vg-labels.zip
   creating: new/base-vg-labels/
   creating: new/base-vg-labels/ep_67_588997/
  inflating: new/base-vg-labels/ep_67_588997/vocab.txt  
  inflating: new/base-vg-labels/ep_67_588997/special_tokens_map.json  
  inflating: new/base-vg-labels/ep_67_588997/config.json  
 extracting: new/base-vg-labels/ep_67_588997/added_tokens.json  
  inflating: new/base-vg-labels/ep_67_588997/pytorch_model.bin  
   creating: new/base-vg-labels/ep_107_1192087/
  inflating: new/base-vg-labels/ep_107_1192087/vocab.txt  
  inflating: new/base-vg-labels/ep_107_1192087/special_tokens_map.json  
  inflating: new/base-vg-labels/ep_107_1192087/config.json  
 extracting: new/base-vg-labels/ep_107_1192087/added_tokens.json  
  inflating: new/base-vg-labels/ep_107_1192087/pytorch_model.bin  

!unzip new/large-vg-labels.zip -d new

Archive:  new/large-vg-labels.zip
   creating: new/large-vg-labels/
   creating: new/large-vg-labels/ep_20_590000/
 extracting: new/large-vg-labels/ep_20_590000/added_tokens.json  
  inflating: new/large-vg-labels/ep_20_590000/config.json  
  inflating: new/large-vg-labels/ep_20_590000/pytorch_model.bin  
  inflating: new/large-vg-labels/ep_20_590000/special_tokens_map.json  
  inflating: new/large-vg-labels/ep_20_590000/vocab.txt  
   creating: new/large-vg-labels/ep_34_999600/
 extracting: new/large-vg-labels/ep_34_999600/added_tokens.json  
  inflating: new/large-vg-labels/ep_34_999600/config.json  
  inflating: new/large-vg-labels/ep_34_999600/pytorch_model.bin  
  inflating: new/large-vg-labels/ep_34_999600/special_tokens_map.json  
  inflating: new/large-vg-labels/ep_34_999600/vocab.txt  
   creating: new/large-vg-labels/ep_55_1617000/
 extracting: new/large-vg-labels/ep_55_1617000/added_tokens.json  
  inflating: new/large-vg-labels/ep_55_1617000/config.json  
  inflating: new/large-vg-labels/ep_55_1617000/pytorch_model.bin  
  inflating: new/large-vg-labels/ep_55_1617000/special_tokens_map.json  
  inflating: new/large-vg-labels/ep_55_1617000/vocab.txt  
   creating: new/large-vg-labels/ep_7_816000/
 extracting: new/large-vg-labels/ep_7_816000/added_tokens.json  
  inflating: new/large-vg-labels/ep_7_816000/config.json  
  inflating: new/large-vg-labels/ep_7_816000/log.txt  
  inflating: new/large-vg-labels/ep_7_816000/pytorch_model.bin  
  inflating: new/large-vg-labels/ep_7_816000/special_tokens_map.json  
  inflating: new/large-vg-labels/ep_7_816000/vocab.txt

Have found a solution to this problem or have idea how to get around it?

I am still working with checkpoint-29-66420.

EByrdS avatar Mar 21 '21 00:03 EByrdS

Have found a solution to this problem or have idea how to get around it?

I did the following to see the contents of the file:

import torch

training_args = torch.load('checkpoint-29-66420/training_args.bin')

type(training_args) # => argparse.Namespace

And this is how they look like:

{'adam_epsilon': 1e-08,
 'add_od_labels': True,
 'config_name': '',
 'data_dir': 'datasets/coco_caption_release/',
 'device': device(type='cuda'),
 'do_eval': False,
 'do_lower_case': True,
 'do_test': False,
 'do_train': True,
 'drop_out': 0.1,
 'eval_model_dir': '',
 'evaluate_during_training': False,
 'gradient_accumulation_steps': 1,
 'img_feature_dim': 2054,
 'img_feature_type': 'frcnn',
 'learning_rate': 3e-05,
 'length_penalty': 1,
 'logging_steps': 20,
 'loss_type': 'sfmx',
 'mask_prob': 0.15,
 'max_gen_length': 20,
 'max_grad_norm': 1.0,
 'max_img_seq_length': 50,
 'max_masked_tokens': 3,
 'max_seq_a_length': 40,
 'max_seq_length': 70,
 'max_steps': -1,
 'min_constraints_to_satisfy': 2,
 'model_name_or_path': 'models/captioning/base-vg-labels/ep_67_588997/',
 'n_gpu': 4,
 'no_cuda': False,
 'num_beams': 5,
 'num_keep_best': 1,
 'num_labels': 2,
 'num_return_sequences': 1,
 'num_train_epochs': 30,
 'num_workers': 4,
 'output_dir': 'output/',
 'output_hidden_states': False,
 'output_mode': 'classification',
 'per_gpu_eval_batch_size': 64,
 'per_gpu_train_batch_size': 64,
 'repetition_penalty': 1,
 'save_steps': -1,
 'scheduler': 'linear',
 'scst': False,
 'seed': 88,
 'temperature': 1,
 'test_yaml': 'test.yaml',
 'tokenizer_name': '',
 'top_k': 0,
 'top_p': 1,
 'train_batch_size': 256,
 'train_yaml': 'train.yaml',
 'use_cbs': False,
 'val_yaml': 'val.yaml',
 'warmup_steps': 0,
 'weight_decay': 0.05}

The file can be saved using:

torch.save(train_args, 'new/path/to/train_args.bin')

So a path to follow would be to change the contents of this file to our needs (or guess).

Please share a convenient configuration for this file if you find any.

Note: Inferring from the arguments, it seems like checkpoint-29-66420 is the result of training 30 epochs (starting at 0) on coco_captions_release dataset, starting from the base-vg-labels/ep_67_588997 model.

EByrdS avatar Mar 21 '21 00:03 EByrdS

Hi @EByrdS , I also find this very confusing, and I am still not sure about the correct configuration. However, I tried to train one of the downloadable models (base-vg-labels/ep_67_588997) with cross-entropy loss for 1 epoch:


python oscar/run_captioning.py \    
    --model_name_or_path pretrained_models/base-vg-labels/ep_67_588997 \
    --do_train \
    --do_lower_case \
    --evaluate_during_training \
    --add_od_labels \
    --learning_rate 0.00001 \
    --per_gpu_train_batch_size 32 \
    --num_train_epochs 1 \
    --save_steps 5000 \
    --output_dir output/

This generated a new checkpoint (in my case named "checkpoint-0-17711") in the output_dir, which now indeed contains the training_args.bin. After testing and evaluation with this new checkpoint


python oscar/run_captioning.py \
    --do_test \
    --do_eval \
    --test_yaml test.yaml \
    --eval_model_dir output/checkpoint-0-17711

I obtained good scores:

SPICE: 0.2186335111075091 Bleu_1: 0.7795251310514796 Bleu_2: 0.6217080261420003 Bleu_3: 0.4746714027013575 Bleu_4: 0.3557547908042079 ROUGE_L: 0.5741837218547888 CIDEr: 1.200443530840875

Hope this answer was helpful!

jontooy avatar Oct 10 '21 12:10 jontooy

Hi @EByrdS , I also find this very confusing, and I am still not sure about the correct configuration. However, I tried to train one of the downloadable models (base-vg-labels/ep_67_588997) with cross-entropy loss for 1 epoch:


python oscar/run_captioning.py \    
    --model_name_or_path pretrained_models/base-vg-labels/ep_67_588997 \
    --do_train \
    --do_lower_case \
    --evaluate_during_training \
    --add_od_labels \
    --learning_rate 0.00001 \
    --per_gpu_train_batch_size 32 \
    --num_train_epochs 1 \
    --save_steps 5000 \
    --output_dir output/

This generated a new checkpoint (in my case named "checkpoint-0-17711") in the output_dir, which now indeed contains the training_args.bin. After testing and evaluation with this new checkpoint


python oscar/run_captioning.py \
    --do_test \
    --do_eval \
    --test_yaml test.yaml \
    --eval_model_dir output/checkpoint-0-17711

I obtained good scores:

SPICE: 0.2186335111075091 Bleu_1: 0.7795251310514796 Bleu_2: 0.6217080261420003 Bleu_3: 0.4746714027013575 Bleu_4: 0.3557547908042079 ROUGE_L: 0.5741837218547888 CIDEr: 1.200443530840875

Hope this answer was helpful!

Do you mind sharing your trained model? I was running it on google colab but there's not enough memory to finish the training. Would really appreciate it. Thanks!

lisaliu1997 avatar Nov 28 '21 03:11 lisaliu1997

Do you mind sharing your trained model? I was running it on google colab but there's not enough memory to finish the training. Would really appreciate it. Thanks!

Hi lisaliu1997,

I don't have that particular model on my hard drive anymore. I refer you to the VinVL_DOWNLOAD page, where you can download pretrained models with the training_args.bin.

jontooy avatar Nov 30 '21 07:11 jontooy