dstoolkit-mlops-base icon indicating copy to clipboard operation
dstoolkit-mlops-base copied to clipboard

Artefact migration between AML workspaces

Open FlorianPydde opened this issue 3 years ago • 2 comments

Currently the template reruns the scripts in different environments. Although it ensures that automate retraining process works, this functionality should be defined as an integration test on a sample set rather than a mean of promoting artefacts. The template needs to implement a process that download and reuploads artefacts to the next AML workspace. This will lower cost and time to production.

FlorianPydde avatar Oct 28 '21 10:10 FlorianPydde

@mariamedp does this mean that retraining is only performed in dev? In some architectures which require more secure separation of prod data, retraining in prod or pre-prod may be preferred.

Perhaps it would help if you could specify what you expect the artefacts to be :)

mvbugge avatar Nov 10 '22 09:11 mvbugge

Update as per offline conversation with @mvbugge:

  • We'll create a YAML template to support artefact migration between AML workspaces that can be easily used as part CI/CD pipelines.
  • We'll build a new modeling pipeline PIPELINE-1b-modeling-<name tbd>.yml to showcase the flow of artifact migration. The reason for this is that this will be a more complex pipeline and won't be needed in all cases, so we'll have it as separate example.
  • The example pipeline will incorporate a third environment, TEST. Changes against any feature branch will be triggered in DEV as in 1-modeling. Changes against main will be triggered in TEST first, then the model artifact will be migrated to PROD, and deployment will happen in PROD as well. Moving from DEV to PROD is not a safe practice since artifacts in DEV have not been generated with code in the main branch, so that's why we avoid showcasing that flow in the template and use a third TEST env instead.
  • This will also allow us to show the flow of having a sequence of environments TEST -> (...) -> PROD to be used against the same branch (main), which is a functionality that has been on our radar since the beginning, and I believe was requested here as well https://github.com/microsoft/dstoolkit-mlops-base/issues/65.

My suggestion for pipeline name: PIPELINE-1b-modeling-with-TEST-env.yml.

mariamedp avatar Nov 11 '22 11:11 mariamedp