Otter Is it correct to set up fsdp for a machine (V100) that does not support bf16?

Is it correct to set up fsdp for a machine (V100) that does not support bf16?

Open xmc-andy opened this issue 1 year ago • 6 comments

compute_environment: LOCAL_MACHINE distributed_type: no downcast_bf16: false machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_use_cluster: false tpu_use_sudo: false use_cpu: false main_process_port: 20687

Sep 14 '23 03:09 xmc-andy

yes it seems correct!

Sep 14 '23 04:09 Luodian

OK，thank u，I also want to ask about the main thread memory is higher than other threads and overflow situation, how I should solve it, do you have suggestions?

yes it seems correct!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Sep 14 '23 04:09 xmc-andy

I think you can refer to this link to see if you can do something.

https://github.com/huggingface/accelerate/blob/6b3e559926afc4b9a127eb7762fc523ea0ea656a/src/accelerate/big_modeling.py#L514

I know that you may able to set device_map=balanced_low_0 to decreased GPU usage on rank 0 (since rank0 will do gather operations and sometimes other params will be shifted to rank 0 so induce to OOM).

Sep 14 '23 06:09 Luodian

Previously I see some code doing so but I didnt use it before, maybe you should do some search on device_map mechanism and how to set it. And we are welcome that you could update your experience to us to help more users tackle the problem on V100 GPU~

Sep 14 '23 06:09 Luodian

Thank u for your shared suggestions, I will try them,

Sep 14 '23 06:09 xmc-andy

I tried setting device_map to 'auto', 'balanced', 'balanced_low_0' or 'sequential' respectively. Unfortunately, it still overflows the memory on 3 V100s (unfrozen ViT). In comparison, I think balanced_low_0 is It might be possible if I have enough cards, I will try it further if I have 4 V100s.

Sep 14 '23 12:09 xmc-andy

Otter Otter copied to clipboard

Is it correct to set up fsdp for a machine (V100) that does not support bf16?

Otter
Otter copied to clipboard