LLaVA
LLaVA copied to clipboard
[Question] Recipe for full finetune llava-v1.5-7b without LoRA in 10 hours on 8xA100(40G)
Question
Thanks for your great works. Recently, I am trying to train my own llava-v1.5-7b on 8xA100(40G). But there is always warning like "4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance." And I only use 257K data for finetune but the time consumption is about 40 hours with zero3 and per_device_train_bs=8, gradient_accumulation_steps=2. I try to lower the per_device_train_bs=4, but it is useless.
Can you provide the recipe for full finetune llava-v1.5-7b without LoRA and can be done in 10 hours? As noted in the Readme.md. Thank you very much.
any recipe? I got no improvement from my LoRA
4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance.
For eliminating this warning, I install deepspeed
from source and add torch.cuda.empty_cache()
before optimizer.step()
. The warning goes away.
4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance.
For eliminating this warning, I install
deepspeed
from source and addtorch.cuda.empty_cache()
beforeoptimizer.step()
. The warning goes away.
Thank you! Could you tell me where the optimizer.step()
is?
For deepspeed-0.12.6
, it can be specified in deepspeed/runtime/zero/stage3.py
, in function _optimizer_step
of class DeepSpeedZeroOptimizer_Stage3
. Hope this may help you.