mpi-operator icon indicating copy to clipboard operation
mpi-operator copied to clipboard

Tensorboard running

Open boriskovalev opened this issue 5 years ago • 2 comments

What a best way to add tensorboard to the MPI Operator?

boriskovalev avatar Aug 13 '19 02:08 boriskovalev

If you write your checkpoints and event files to a shared location (NFS, s3, GCS, etc.), you can just point tensorboard to it. There is tensorboard support in Pipelines: https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/#tensorboard, but I don't know if anyone has tried it with the mpi operator.

rongou avatar Aug 13 '19 14:08 rongou

Yes, that approach should work. Writing your summary to a shared location, start a standalone pod or reuse a MPI pod (may not be ideal resource-wise) that runs TensorBoard command, and then create a service to expose the port. MPI operator doesn't include TensorBoard service though. There are usually additional and custom logic for dealing with its resources, ports, annotations, etc.

terrytangyuan avatar Aug 13 '19 14:08 terrytangyuan