dumpmemory

Results 51 comments of dumpmemory

it seems that each forward will increase the memory.

my env is : pytorch 1.12.1 deepspeed 0.8.2

Zero 2 Setting is ok yaml ```yaml compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} machine_rank: 0...

might relate to https://github.com/microsoft/DeepSpeed/issues/2637

disable zero_init seems work on gpt2 gpt2-xl. Now i am facing Overflow now for 1.3b gpt2 with fp16

Using bfp16 there is no OVERFLOW Now !! Finally i can use lora with deepspeed Zero3 Thanks !

@pacman100 it seems is the issue from peft code . pls look at https://github.com/microsoft/DeepSpeed/issues/3002 and i have made a pr to fix this issue

How about flops for mamba2 ? does any one know how to calculate it manually ?

I have faced hang issues after 1:30 hours training time wiht ft and zero3