mpi-operator
mpi-operator copied to clipboard
Add a step to upload artifact
a quick question: how the artifacts(models) are managed in mpi job? possible solutions are: uploading to s3/ provide pvc and etc
Yes you can provide volume to the spec.
But would it be better if we could upload artifacts to s3 repository?
That's not the focus of MPI operator here. I'd suggest checking out artifacts in Argo Workflows.
The point is how the mpi operator works with argo in this scenario? we can leverage argo to manage pipeline but I don't think the output artifacts are available to argo because these artifacts resides in the pod which is created by mpi-operator. we can use argo to submit mpi job but If I don't have any storage system available, I may not be able to get the output artifacts.
Maybe we could discuss it in the community meeting.
@MartinForReal Post-processing a model looks more like what we can do within the container. In the training script, we can use python api to upload the model to s3 or other cloud storage as you mentioned there is no shortage system in your cluster.
I think this is a valid use case. From user's perspective, user should use training operators with Kubeflow pipeline easily. The challenge to support native artifact is because all jobs are submitted from a client pod triggered by KFP. the client pod itself doesn't have the training job artifacts.
This is not a MPI-operator specific problem, any KFP Op which submit a remote job has this problem.
How about having a pre processor/post processor for each job?