axlearn
axlearn copied to clipboard
nodeSelector set by default requires tpu provisioner
These are the nodeSelectors that got added:
Node-Selectors: cloud.google.com/gke-accelerator-count=4
cloud.google.com/gke-spot=true
cloud.google.com/gke-tpu-accelerator=tpu-v5-lite-podslice
cloud.google.com/gke-tpu-topology=16x16
provisioner-nodepool-id=stoelinga-8733bd
This was my launch job:
export BASTION_TIER=1
axlearn gcp gke start --instance_type=tpu-v5litepod-256 --num_replicas=1 \
--cluster=v5e-256-bodaborg-us-west4 --bundler_spec=allow_dirty=True \
--bundler_type=artifactregistry --bundler_spec=image=tpu \
--bundler_spec=dockerfile=Dockerfile --bundler_spec=target=tpu \
-- python3 -c "'import jax; print(jax.devices())'"
Expectation: The job should not have this selector provisioner-nodepool-id=stoelinga-8733bd since that assumes the tpu provisioner is always used. This may not be the case for external users.