LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] Recipe for full finetune llava-v1.5-7b without LoRA in 10 hours on 8xA100(40G)

Open tzjtatata opened this issue 11 months ago • 4 comments

Question

Thanks for your great works. Recently, I am trying to train my own llava-v1.5-7b on 8xA100(40G). But there is always warning like "4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance." And I only use 257K data for finetune but the time consumption is about 40 hours with zero3 and per_device_train_bs=8, gradient_accumulation_steps=2. I try to lower the per_device_train_bs=4, but it is useless.

Can you provide the recipe for full finetune llava-v1.5-7b without LoRA and can be done in 10 hours? As noted in the Readme.md. Thank you very much.

tzjtatata avatar Mar 18 '24 08:03 tzjtatata

any recipe? I got no improvement from my LoRA

fisher75 avatar Apr 25 '24 13:04 fisher75

4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance.

For eliminating this warning, I install deepspeed from source and add torch.cuda.empty_cache() before optimizer.step(). The warning goes away.

xing0047 avatar May 02 '24 08:05 xing0047

4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance.

For eliminating this warning, I install deepspeed from source and add torch.cuda.empty_cache() before optimizer.step(). The warning goes away.

Thank you! Could you tell me where the optimizer.step() is?

ZitianTang avatar May 09 '24 04:05 ZitianTang

For deepspeed-0.12.6, it can be specified in deepspeed/runtime/zero/stage3.py, in function _optimizer_step of class DeepSpeedZeroOptimizer_Stage3. Hope this may help you.

xing0047 avatar May 09 '24 05:05 xing0047