LLaVA-NeXT
LLaVA-NeXT copied to clipboard
3 pytorch allocator cache flushes since last step
consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
Cache cleared
{'loss': 1.0039, 'grad_norm': 4.742300987243652, 'learning_rate': 2.7891156462585034e-06, 'epoch': 0.01}
1%|▊ | 41/4870 [04:27<8:45:34, 6.53s/it]
Cache cleared
{'loss': 0.966, 'grad_norm': 4.477814197540283, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.01}
1%|▊ | 42/4870 [08:04<18:14:03, 13.60s/it]
in your train.py i added below.
` class CacheClearingCallback(TrainerCallback): def init(self): pass def on_step_end(self, args, state, control, **kwargs): if state.is_world_process_zero: torch.cuda.empty_cache() print(f"Cache cleared ")
data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
trainer = LLaVATrainer(model=model, tokenizer=tokenizer, args=training_args, **data_module, callbacks=[CacheClearingCallback()])
`