axlearn
axlearn copied to clipboard
Adding LWS Integration
Added integration with https://github.com/kubernetes-sigs/lws for TPUs, as well as integration of LWS + Pathways.
To run basic LWS+TPU
axlearn gcp launch run --cluster=$CLUSTER \
--runner_name gke_tpu_lws \
--name=$USER \
--instance_type=tpu-v6e-16 \
--bundler_spec=allow_dirty=True \
--bundler_type=artifactregistry --bundler_spec=image=tpu \
--bundler_spec=dockerfile=Dockerfile --bundler_spec=target=tpu \
-- sleep infinity;
To run LWS+Pathways
axlearn gcp launch run --cluster=$CLUSTER \
--runner_name gke_tpu_lws_pathways \
--name=$USER \
--instance_type=tpu-v6e-16 \
--bundler_spec=allow_dirty=True \
--bundler_type=artifactregistry --bundler_spec=image=tpu \
--bundler_spec=dockerfile=Dockerfile --bundler_spec=target=tpu \
-- sleep infinity;