Param Bole
Param Bole
## Description Adds JAX Hello World Multi-Node GKE H100 with GPUDirectTCPx tutorial ## Tasks * [x] The [contributing guide](https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/.github/CONTRIBUTING.md) has been read and followed. * [x] The samples added /...
With the addition of new GKE GPU infrastructure and network architecture, there is a need to add new examples that demonstrate how to leverage them. As part of this task,...
when i do sudo mean init its throwing an error before it was working if i dont use sudo it works but i am not able to create a directory
### Description Our a cluster node which already has Nvidia drivers and the CUDA toolkit installed (to maintain version compatibility with the underlying OS and the networking stack). Installing via...
Currently we are using `gcr.io` to store the docker images being generated ref: https://github.com/google/maxtext/blob/main/.github/workflows/build_and_upload_images.sh#L51 But with `gcr.io` being deprecated. We should move to Artifact Registry.