mpi-operator
mpi-operator copied to clipboard
Tensorboard running
What a best way to add tensorboard to the MPI Operator?
If you write your checkpoints and event files to a shared location (NFS, s3, GCS, etc.), you can just point tensorboard to it. There is tensorboard support in Pipelines: https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/#tensorboard, but I don't know if anyone has tried it with the mpi operator.
Yes, that approach should work. Writing your summary to a shared location, start a standalone pod or reuse a MPI pod (may not be ideal resource-wise) that runs TensorBoard command, and then create a service to expose the port. MPI operator doesn't include TensorBoard service though. There are usually additional and custom logic for dealing with its resources, ports, annotations, etc.