mm-cot
mm-cot copied to clipboard
[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large...
Thanks for the great work, I am trying to implement the work of this paper on google colab with 166 G disk and T4. but at the training stage for both rationale generation and answer inference I got the output:
2023-09-29 17:27:49.955571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-large', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
"data_root": "/content/mm-cot/data",
"output_dir": "/content/mm-cot/experiments",
"model": "declare-lab/flan-alpaca-large",
"options": [
"A",
"B",
"C",
"D",
"E"
],
"epoch": 50,
"lr": 5e-05,
"bs": 2,
"input_len": 512,
"output_len": 512,
"eval_bs": 4,
"eval_acc": null,
"train_split": "train",
"val_split": "val",
"test_split": "test",
"use_generate": true,
"final_eval": false,
"user_msg": "rationale",
"img_type": "vit",
"eval_le": null,
"test_le": null,
"evaluate_dir": null,
"caption_file": "data/instruct_captions.json",
"use_caption": true,
"prompt_format": "QCM-E",
"seed": 42
}
img_features size: torch.Size([11208, 145, 1024])
number of train problems: 12726
number of val problems: 4241
number of test problems: 4241
[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large...
and the cell stop and the expermint folder is empty. can anyone explain what is the problem for me? (I am still a new learner in the field)
Hi, did you try to conduct a unit test to see if it is possible to load a pre-trained model using huggingface?
My guess is that the memory is not enough for loading the model.
from transformers import T5ForConditionalGeneration
# you may also try to change "declare-lab/flan-alpaca-large" to "declare-lab/flan-alpaca-base" to see if it goes well. model = T5ForConditionalGeneration.from_pretrained("declare-lab/flan-alpaca-large")
after a unit test by loading the model from huggingface.
(…)an-alpaca-large/resolve/main/config.json: 100%
787/787 [00:00<00:00, 56.3kB/s]
model.safetensors: 100%
3.13G/3.13G [00:16<00:00, 261MB/s]
(…)arge/resolve/main/generation_config.json: 100%
142/142 [00:00<00:00, 13.6kB/s]
I have changed the model from large to base but I have encountered the same:
2023-10-17 16:48:58.529434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-base', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
"data_root": "/content/mm-cot/data",
"output_dir": "/content/mm-cot/experiments",
"model": "declare-lab/flan-alpaca-base",
"options": [
"A",
"B",
"C",
"D",
"E"
],
"epoch": 50,
"lr": 5e-05,
"bs": 2,
"input_len": 512,
"output_len": 512,
"eval_bs": 4,
"eval_acc": null,
"train_split": "train",
"val_split": "val",
"test_split": "test",
"use_generate": true,
"final_eval": false,
"user_msg": "rationale",
"img_type": "vit",
"eval_le": null,
"test_le": null,
"evaluate_dir": null,
"caption_file": "data/instruct_captions.json",
"use_caption": true,
"prompt_format": "QCM-E",
"seed": 42
}
img_features size: torch.Size([11208, 145, 1024])
number of train problems: 12726
number of val problems: 4241
number of test problems: 4241
[16:49:05] [Model]: Loading declare-lab/flan-alpaca-base...
I am using google colab T4 with high RAM
The hanging may also be reasonable as the main process could be handling the data after loading the model (there is no signal for indicating the completion of model loading).