cloud icon indicating copy to clipboard operation
cloud copied to clipboard

Distributed training for Keras-Tuner

Open yixingfu opened this issue 4 years ago • 1 comments

After some experimenting, using keras-tuner with chief-worker distribution can (only) be done by setting KERASTUNER_ORACLE_IP to 0.0.0.0 on chief, but the actual chief IP obtained from TFCONFIG on worker replica. Distributed training is certainly useful when we support keras-tuner. Should we implement this special case here, or just create an example showing how to distribute keras-tuner clearly?

Alternatively this could be solved from the keras-tuner side, but from what I see this is only a problem putting KT on AI platform using tensorflow-cloud; but not an issue for KT distribution in general.

yixingfu avatar Jul 06 '20 14:07 yixingfu

Thank you for the issue @yixingfu. This is definitely an important feature and we have plans to add support for this in the future.

pavithrasv avatar Jul 07 '20 01:07 pavithrasv