litgpt
litgpt copied to clipboard
Compatible with local 8xH100 instead of cloud?
Hello. I have access to a local 8x H100 GPU cluster and want to try the Tinylama pretraining tutorial. Is this supported? or do i have utilize cloud GPUS?
Thanks
If you have all the dependencies installed, that should be supported. You can check out the tutorials/pretrain_tinyllama.md tutorial in this repo. Let us know what results you get, I'd be curious.
is this the documentaiton that can help me set this up? https://lightning.ai/docs/pytorch/stable/clouds/cluster_expert.html.
or is there any other documentation suggesting how?
@michaellin99999 On a single H100 node you don't need to set anything up. You can just run the script (granted you follwed the tutorial preparation steps) and it will use all GPUs by default.
If you have a cluster of multiple H100 nodes, the steps will depend on your cluster setup. Most likely you have SLURM. Then follow the SLURM guide here: https://lightning.ai/docs/fabric/stable/fundamentals/launch.html#launch-on-a-cluster otherwise follow the "bare bones cluster" guide on that same page.
Thank you!