open_flamingo
open_flamingo copied to clipboard
Notes on FP16 inference
For no reason at all, I decided to try running the model on my RTX 3090. This turned out to be surprisingly difficult, so I am documenting my process here for people to search up.
Although the README provides some instructions on how to load the model,
- it loads the model onto CPU by default,
- it loads the model as fp32 by default,
- requires_grad is enabled for the new layers by default,
- the
precision=...parameter fromopen_clipis partially broken due to a forced fp32 conversion in LayerNormFP32 - attempting to load LLaMA in 8bit breaks the model, because
acceleratediscards the injectedFlamingoLMMixinwhendevice_mapis used,
I have created a simple fork of this repo here that is slightly easier to use for people with consumer GPUs. Unfortunately, it still consumes a large amount of vram (because of the lack of 8bit), and it takes a long time to load the llama weights (because accelerate is broken)
I followed the fork https://github.com/mlfoundations/open_flamingo/commit/25b17319723b41f900cd52a389466b97c053695d but am unable to load the Llama weights on the 3090 - crashes when I try to load the llama weights in fp16
NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7
Never mind, I was able to get things working. Had to decrease batch size to 2 to have sufficient memory.
Note also that one has to cast the inputs to half-precision, e.g. batch_images = batch_images.half()
Performance on the OK-VQA benchmark with the following settings:
python open_flamingo/eval/evaluate.py \
--lm_path $LM_PATH \
--lm_tokenizer_path $LM_TOKENIZER_PATH \
--checkpoint_path $CKPT_PATH \
--device $DEVICE \
--cross_attn_every_n_layers 4 \
--eval_ok_vqa \
--ok_vqa_image_dir_path $VQAV2_IMG_PATH \
--ok_vqa_annotations_json_path $VQAV2_ANNO_PATH \
--ok_vqa_questions_json_path $VQAV2_QUESTION_PATH \
--results_file $RESULTS_FILE \
--num_samples 5000 --shots 2 --num_trials 1 \
--batch_size 2
Shots 2 Trial 0 OK-VQA score: 35.15
Shots 2 Mean OK-VQA score: 35.15