LLaVA-NeXT 3 pytorch allocator cache flushes since last step

3 pytorch allocator cache flushes since last step

Open mylesgoose opened this issue 1 year ago • 0 comments

consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time Cache cleared {'loss': 1.0039, 'grad_norm': 4.742300987243652, 'learning_rate': 2.7891156462585034e-06, 'epoch': 0.01}
1%|▊ | 41/4870 [04:27<8:45:34, 6.53s/it] Cache cleared {'loss': 0.966, 'grad_norm': 4.477814197540283, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.01}
1%|▊ | 42/4870 [08:04<18:14:03, 13.60s/it]

in your train.py i added below.

` class CacheClearingCallback(TrainerCallback): def init(self): pass def on_step_end(self, args, state, control, **kwargs): if state.is_world_process_zero: torch.cuda.empty_cache() print(f"Cache cleared ")

data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
trainer = LLaVATrainer(model=model, tokenizer=tokenizer, args=training_args, **data_module, callbacks=[CacheClearingCallback()])

Sep 17 '24 07:09 mylesgoose

LLaVA-NeXT LLaVA-NeXT copied to clipboard

3 pytorch allocator cache flushes since last step

LLaVA-NeXT
LLaVA-NeXT copied to clipboard