Big GPU footprint while training
Describe the bug Not really a bug per se, more of a question/clarification request. If there are better avenues to discuss (discord server?) these issues I apologize for using the wrong channel.
To Reproduce Feel free to run this test: https://github.com/openclimatefix/skillful_nowcasting/blob/main/tests/test_model.py#L305
The training takes about 40G on the GPU with a batch size of 1. Had to upgrade to a A100 to be able to run decently.
Just curious if this is expected behavior or if you recommend another approach.
No, this is a great place to talk about it! That test does use a lot of GPU memory, and I think that is just expected, the model is almost a 1 to 1 copy of the DeepMind psuedocode and training code they released, and they trained it on 16 TPUs. I would try to just run it with reduced parameters, or smaller input size. Unfortunately its just a large model. But that is why I skip that test in the CI actions.
Ok this is what I thought.
I might play around with multi-gpu or TPU training with XLA to see if I can crank the batch size up. I am also curious if switching to a 16-bit architecture could help. The input data is 16 bits anyways.
If you're interested I also have some code to feed the tfrecords from the original dataset into the DataLoader (keeping the tf parallelism), happy to send a PR. I'm trying to reproduce the results from the paper.
Yeah, a PR would be great! I'm currently working on mirroring the dataset on HuggingFace to make it easier for anyone else to reproduce the paper, but for now that code requires TF.
Tf/sonnet implementation I have trains decently distributed across 8 tpu cores (v2) and matching paper params (except global batch size 8 per step, and one sample per input during gen. step). Assume DM trained theirs on the v3 cores. Full model seems to fit on a GCP n1-highmem-64 (416 gig ram), albeit slow and useless.
Thanks for the insights!
Thanks for your questions. I have run this code with TeslaV100s(32G), unfortunately, it raises Cuda out of memory. Do you have some suggestions or configurations for training this model with GPU-RAM=32G? @johmathe @jacobbieker
Best regards.
I'm working on adding a training script that uses Deepspeed which should help reduce the gpu memory requirements, albeit with some reduced training speed. Other than that, my only other suggestion is to use a smaller model or half precision training.
Thanks for your quick reply. I will try to do it and hope your training script. @jacobbieker Best regards.