tf-metal-experiments
tf-metal-experiments copied to clipboard
does M1 max can use all of his memory in training???
m1 max has 64 GB RAM.
Q1. if who want to train model with m1 max, his the biggest reason of choosing m1 max is the huge VRAM not the throughput. I also want to know how large memory can be used in training model with m1 max. (60? 55???)
Q2. do you have any problem with tensorflow-metal??? do you have any plan to improve this post with some experience about problem&solution in M1 training with TF. (some code or compatibilities with major packages in ML/Deep Learning) It is just my suggestion. but I think many people want this experience sharing.
Thank you!
Q1:
I only have the 32GB version, so I cannot answer with absolute certainty. I observe that about ~3GB is required for the OS etc, so in theory you would have about ~60GB for training.
That said, I do not believe it makes sense to train large models, because it will be even slower than the smaller models, which is already 8x-10x slower than the usual GPUs we would use for training (V100, 3090, A100 etc.). VRAM issue can be mitigated through a variety of strategies (mixed precision, activation checkpointing, gradient accumulation, DeepSpeed, or even optimizer choice (Adafactor vs Adam).
Q2:
I have not encounter any problems so far, it is surprisingly painless.
Q2:
TF/Python tapped 57.19 GB of memory during training and 95.7% of the GPU.