mm-cot icon indicating copy to clipboard operation
mm-cot copied to clipboard

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large...

Open Sosycs opened this issue 1 year ago • 3 comments

Thanks for the great work, I am trying to implement the work of this paper on google colab with 166 G disk and T4. but at the training stage for both rationale generation and answer inference I got the output:

2023-09-29 17:27:49.955571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-large', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-large",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... 

and the cell stop and the expermint folder is empty. can anyone explain what is the problem for me? (I am still a new learner in the field)

Sosycs avatar Sep 29 '23 17:09 Sosycs

Hi, did you try to conduct a unit test to see if it is possible to load a pre-trained model using huggingface?

My guess is that the memory is not enough for loading the model.

from transformers import T5ForConditionalGeneration

# you may also try to change "declare-lab/flan-alpaca-large" to "declare-lab/flan-alpaca-base" to see if it goes well. model = T5ForConditionalGeneration.from_pretrained("declare-lab/flan-alpaca-large")

cooelf avatar Oct 15 '23 09:10 cooelf

after a unit test by loading the model from huggingface.

(…)an-alpaca-large/resolve/main/config.json: 100%
787/787 [00:00<00:00, 56.3kB/s]
model.safetensors: 100%
3.13G/3.13G [00:16<00:00, 261MB/s]
(…)arge/resolve/main/generation_config.json: 100%
142/142 [00:00<00:00, 13.6kB/s]

I have changed the model from large to base but I have encountered the same:

2023-10-17 16:48:58.529434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-base', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-base",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[16:49:05] [Model]: Loading declare-lab/flan-alpaca-base...    

I am using google colab T4 with high RAM

Sosycs avatar Oct 17 '23 16:10 Sosycs

The hanging may also be reasonable as the main process could be handling the data after loading the model (there is no signal for indicating the completion of model loading).

cooelf avatar May 19 '24 06:05 cooelf