cloud
cloud copied to clipboard
Distributed training for Keras-Tuner
After some experimenting, using keras-tuner with chief-worker distribution can (only) be done by setting KERASTUNER_ORACLE_IP
to 0.0.0.0 on chief, but the actual chief IP obtained from TFCONFIG on worker replica.
Distributed training is certainly useful when we support keras-tuner. Should we implement this special case here, or just create an example showing how to distribute keras-tuner clearly?
Alternatively this could be solved from the keras-tuner side, but from what I see this is only a problem putting KT on AI platform using tensorflow-cloud; but not an issue for KT distribution in general.
Thank you for the issue @yixingfu. This is definitely an important feature and we have plans to add support for this in the future.