dumpmemory comments

Results 51 comments of


                                            dumpmemory

accelerate+deepspeed: ValueError: max() arg is an empty sequence

@markusdr u might use following ```python optimizer = torch.optim.AdamW(model.parameters(), lr=lr) ```

accelerate+deepspeed: ValueError: max() arg is an empty sequence

> did u facing gpu memory increase with zero 3 setting ? I find the reason, we can disable the zero init option to fix the gpu memory increasing with...

Add scaled_dot_product_attention to replace flash attention

I have set scaled_dot_product_attention as default when the torch 2.0 was installed. It should be as efficient as original.

Add scaled_dot_product_attention to replace flash attention

> I tested this PR with Torch 2.0 on my 4x40GB A100, but found that it is 2x slower than the original flash attention implementation. I haven't dug into the...

Add scaled_dot_product_attention to replace flash attention

> haotian-liu Hi, my deepspeed config is just accelerate config with zero 2 setting and cpu offload and bf16 enabled. I will upload later. deepspeed.json ``` { "train_batch_size": "auto", "train_micro_batch_size_per_gpu":...

try to fix Zero3 Memory Leak following @tohtana idea

> Hello @dumpmemory, great work getting this issue solved from DeepSpeed and raising the fix here. Could you apply the fix to all places in lora and adalora wherein F.linear...

try to fix Zero3 Memory Leak following @tohtana idea

@pacman100 pls help me to check it. i have made all F.linear replaced.

try to fix Zero3 Memory Leak following @tohtana idea

> Thank you @dumpmemory for iterating, LGTM! 🤗 > > Could you run `make style` and `make quality` to fix the quality issues? yes, i will. I will also test...

try to fix Zero3 Memory Leak following @tohtana idea

this pr is no longer required.

GPT2 Training GPU Memory Increase with LoRA and Zero 3

both main and pr #145 failed.