tf-metal-experiments icon indicating copy to clipboard operation
tf-metal-experiments copied to clipboard

does M1 max can use all of his memory in training???

Open wildbrother opened this issue 2 years ago • 2 comments

m1 max has 64 GB RAM.

Q1. if who want to train model with m1 max, his the biggest reason of choosing m1 max is the huge VRAM not the throughput. I also want to know how large memory can be used in training model with m1 max. (60? 55???)

Q2. do you have any problem with tensorflow-metal??? do you have any plan to improve this post with some experience about problem&solution in M1 training with TF. (some code or compatibilities with major packages in ML/Deep Learning) It is just my suggestion. but I think many people want this experience sharing.

Thank you!

wildbrother avatar Nov 07 '21 23:11 wildbrother

Q1:

I only have the 32GB version, so I cannot answer with absolute certainty. I observe that about ~3GB is required for the OS etc, so in theory you would have about ~60GB for training.

That said, I do not believe it makes sense to train large models, because it will be even slower than the smaller models, which is already 8x-10x slower than the usual GPUs we would use for training (V100, 3090, A100 etc.). VRAM issue can be mitigated through a variety of strategies (mixed precision, activation checkpointing, gradient accumulation, DeepSpeed, or even optimizer choice (Adafactor vs Adam).

Q2:

I have not encounter any problems so far, it is surprisingly painless.

tlkh avatar Nov 08 '21 03:11 tlkh

Q2:

TF/Python tapped 57.19 GB of memory during training and 95.7% of the GPU.

st8tikratio avatar Jan 17 '22 17:01 st8tikratio