dumpmemory comments

Results 51 comments of


                                            dumpmemory

GPT2 Training GPU Memory Increase with LoRA and Zero 3

it seems that each forward will increase the memory.

GPT2 Training GPU Memory Increase with LoRA and Zero 3

my env is : pytorch 1.12.1 deepspeed 0.8.2

GPT2 Training GPU Memory Increase with LoRA and Zero 3

Zero 2 Setting is ok yaml ```yaml compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} machine_rank: 0...

GPT2 Training GPU Memory Increase with LoRA and Zero 3

might relate to https://github.com/microsoft/DeepSpeed/issues/2637

GPT2 Training GPU Memory Increase with LoRA and Zero 3

disable zero_init seems work on gpt2 gpt2-xl. Now i am facing Overflow now for 1.3b gpt2 with fp16

GPT2 Training GPU Memory Increase with LoRA and Zero 3

Using bfp16 there is no OVERFLOW Now !! Finally i can use lora with deepspeed Zero3 Thanks !

GPT2 Training GPU Memory Increase with LoRA and Zero 3

@pacman100 it seems is the issue from peft code . pls look at https://github.com/microsoft/DeepSpeed/issues/3002 and i have made a pr to fix this issue

How to compute MACs or FLOPs of mamba

How about flops for mamba2 ? does any one know how to calculate it manually ?

Fix gist update call

it worked

Mixtral 8x7B full finetune with DS zero3: Assertion error

I have faced hang issues after 1:30 hours training time wiht ft and zero3