mpi-operator Add a step to upload artifact

a quick question: how the artifacts(models) are managed in mpi job? possible solutions are: uploading to s3/ provide pvc and etc

Mar 18 '21 03:03 MartinForReal

Yes you can provide volume to the spec.

Mar 18 '21 06:03 terrytangyuan

But would it be better if we could upload artifacts to s3 repository?

Mar 18 '21 06:03 MartinForReal

That's not the focus of MPI operator here. I'd suggest checking out artifacts in Argo Workflows.

Mar 18 '21 06:03 terrytangyuan

The point is how the mpi operator works with argo in this scenario? we can leverage argo to manage pipeline but I don't think the output artifacts are available to argo because these artifacts resides in the pod which is created by mpi-operator. we can use argo to submit mpi job but If I don't have any storage system available, I may not be able to get the output artifacts.

Mar 18 '21 06:03 MartinForReal

Maybe we could discuss it in the community meeting.

Mar 18 '21 11:03 gaocegege

@MartinForReal Post-processing a model looks more like what we can do within the container. In the training script, we can use python api to upload the model to s3 or other cloud storage as you mentioned there is no shortage system in your cluster.

Mar 19 '21 02:03 zw0610

I think this is a valid use case. From user's perspective, user should use training operators with Kubeflow pipeline easily. The challenge to support native artifact is because all jobs are submitted from a client pod triggered by KFP. the client pod itself doesn't have the training job artifacts.

This is not a MPI-operator specific problem, any KFP Op which submit a remote job has this problem.

Mar 19 '21 03:03 Jeffwan

How about having a pre processor/post processor for each job?

Jun 23 '21 11:06 johnugeorge

mpi-operator mpi-operator copied to clipboard

Add a step to upload artifact

mpi-operator
mpi-operator copied to clipboard