mpi-operator icon indicating copy to clipboard operation
mpi-operator copied to clipboard

Add a step to upload artifact

Open MartinForReal opened this issue 4 years ago • 8 comments

a quick question: how the artifacts(models) are managed in mpi job? possible solutions are: uploading to s3/ provide pvc and etc

MartinForReal avatar Mar 18 '21 03:03 MartinForReal

Yes you can provide volume to the spec.

terrytangyuan avatar Mar 18 '21 06:03 terrytangyuan

But would it be better if we could upload artifacts to s3 repository?

MartinForReal avatar Mar 18 '21 06:03 MartinForReal

That's not the focus of MPI operator here. I'd suggest checking out artifacts in Argo Workflows.

terrytangyuan avatar Mar 18 '21 06:03 terrytangyuan

The point is how the mpi operator works with argo in this scenario? we can leverage argo to manage pipeline but I don't think the output artifacts are available to argo because these artifacts resides in the pod which is created by mpi-operator. we can use argo to submit mpi job but If I don't have any storage system available, I may not be able to get the output artifacts.

MartinForReal avatar Mar 18 '21 06:03 MartinForReal

Maybe we could discuss it in the community meeting.

gaocegege avatar Mar 18 '21 11:03 gaocegege

@MartinForReal Post-processing a model looks more like what we can do within the container. In the training script, we can use python api to upload the model to s3 or other cloud storage as you mentioned there is no shortage system in your cluster.

zw0610 avatar Mar 19 '21 02:03 zw0610

I think this is a valid use case. From user's perspective, user should use training operators with Kubeflow pipeline easily. The challenge to support native artifact is because all jobs are submitted from a client pod triggered by KFP. the client pod itself doesn't have the training job artifacts.

This is not a MPI-operator specific problem, any KFP Op which submit a remote job has this problem.

Jeffwan avatar Mar 19 '21 03:03 Jeffwan

How about having a pre processor/post processor for each job?

johnugeorge avatar Jun 23 '21 11:06 johnugeorge