LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

A question about training cost

Open Richar-Du opened this issue 1 year ago • 4 comments

Thanks for your awesome work! Would you mind mentioning the training cost of LLaVA (including both the GPU hours and cost of API)? Thanks in advance :)

Richar-Du avatar Apr 19 '23 05:04 Richar-Du

Given that they ran GPT-4 over the full 600k cc3m subset, the vast majority of the cost involved should come from the API calls to gpt-4 itself (somewhere between $10k~100k). The cost of GPU rental would be tiny in comparison.

152334H avatar Apr 19 '23 05:04 152334H

@Richar-Du Thanks for your question and for the interest in our work.

We pretrain our model on 595K data on 8x A100s for around 5 hours. The finetuning of the initial release takes ~10 hours on the same machine. We also find using a smaller subset can achieve similar performance. We'll update details of these experiments later.

Thanks @152334H for answering, but we would clarify that we do not need to run GPT-4 on CC3M for pretraining stage. We use the official CC3M captions directly, without using BLIP synthetic captions in the first release. Only the instruction tuning stage data is generated by GPT-4. You can refer to our released LLaVA-CC3M-Pretrain-595K and LLaVA-Instruct-150K for more details.

We'll update the information in our paper as well, thanks.

haotian-liu avatar Apr 20 '23 16:04 haotian-liu

Oh okay, my apologies for misunderstanding!

152334H avatar Apr 20 '23 16:04 152334H

@152334H No worries at all! Thank you for your contribution to the discussion, and looking forward to hearing more feedbacks from you guys!

haotian-liu avatar Apr 20 '23 17:04 haotian-liu