mlem `mlem deploy run` should be able to re-deploy the model regardless of state

I stumble upon this each time I re-generate http://github.com/iterative/example-mlem-get-started

To be able to re-deploy, we use this:

if heroku apps:info example-mlem-get-started-app; then
  heroku apps:destroy example-mlem-get-started-app --confirm example-mlem-get-started-app
  heroku container:login
fi

Which is fine for some cases, but I would like to have a --force or --ignore-state flag, to just re-deploy without falling back to Heroku CLI first.

Besides, when I remove the deployment outside of MLEM, there is a time frame when app is not available. I suppose, Heroku should handle this gracefully, but there is no way since I need to remove the app.

Finally, app settings are removed (e.g. I need to change Dyno to Hobby now each time manually). Not convenient :)

[ ] Remove the mention of app.mlem.state from the Get Started when addressed (keep in UG if needed). See https://github.com/iterative/mlem.ai/pull/265#pullrequestreview-1216680523.

Oct 31 '22 07:10 aguschin

After logging in, most users will try rerunning the same command, so instead of throwing an error, we should suggest users either re-use the deployment configuration (app.mlem) or overwrite it.

$ mlem deployment run heroku app.mlem --model models/clf-model  --app_name bank-churn-app
:floppy_disk: Saving deployment to app.mlem
:hourglass_flowing_sand:️ Loading model from models/clf-model.mlem
🏛 Creating Heroku App bank-churn-app
 ›   Error: Invalid credentials provided.
 ›
 ›   Error ID: unauthorized
:x: Invalid credentials provided.

$ heroku login
 ›   Warning: heroku update available from 7.63.4 to 7.66.4.
heroku: Press any key to open up the browser to login or q to exit: 
Opening browser to https://cli-auth.heroku.com/auth/cli/browser/...
Logging in... done
Logged in as ...

$ mlem deployment run heroku app.mlem --model models/clf-model  --app_name bank-churn-app
:hourglass_flowing_sand:️ Loading deployment from app.mlem
:x: Deployment meta already exists at path=.../demo-bank-customer-churn/app.mlem' project=None rev=None 
uri='.../demo-bank-customer-churn/app.mlem' project_uri=None 
fs=<fsspec.implementations.local.LocalFileSystem object at 0x101ff9c40>. Please use `mlem deployment run --load <path> ...`

Nov 23 '22 14:11 alex000kim

Probably we can also add --force flag (or --overwrite?)

Nov 25 '22 11:11 mike0sv

That's def an option. My instinctive reaction was to simply rerun the same command after I ran heroku login, so I'd expect a prompt e.g.:

There's an existing deployment configuration `app.mlem`. Do you want to overwrite it? (y,N)

These options aren't mutually exclusive, though. So we could do both.

Nov 25 '22 14:11 alex000kim

For rerun, we could simply check that the command (mlem deployment run heroku app.mlem --model models/clf-model --app_name bank-churn-app) is going to create exactly the same file => it doesn't change the existing file and we don't need to error out - instead, continue like it was expected. WDYT?

Nov 25 '22 14:11 aguschin

sounds good!

Nov 25 '22 20:11 alex000kim

@aguschin a re-run of the command should guarantee the same result (idempotent) - meaning, if I used other means to change the heroku app (heroku dashboard or cli or another mlem command from another project), and I rerun mlem deployment run heroku app.mlem --model models/clf-model --app_name bank-churn-app - it should align/"fix" the broken state. If it finishes successfully but there's not guarantee at what's deployed that is a "bigger" breakage IMO. For that inspecting a local mlem generated file is not enough, right? we need to actually force deploy, regardless of what's currently deployed.

Another UX point - I would personally expect the "--force" behavior here to be the default behavior. I think this covers the more common expected behavior (idempotancy). The minority of cases would be people fearing an override (expecting first deployment, or avoid redundant operations or downtime on a deployed service). So maybe changing default behavior to "force" and adding a "--strict" or "--safe" flag for the current behavior is a more friendly UX?

Dec 21 '22 01:12 omesser

a re-run of the command should guarantee the same result (idempotent)

I would personally expect the "--force" behavior to be the default behavior

Good point, thanks for sharing your thoughts @omesser. We had a discussion about this with @mike0sv - if I recall correctly, we couldn't strongly decide on what to choose here - --force or --safe behavior to be the default here. IIRC, we decided to went with --force as default (we implemented it for mlem build), Mike please correct me if I'm wrong.

Besides, Mike had some concerns about this overall - if we try to manage deployments completely (like re-deploying a partially broken deployment), we may end up reimplementing Terraform, which is not a desired scenario for us. @mike0sv, could you please elaborate your opinion for @omesser here?

Dec 21 '22 05:12 aguschin

Once this is fixed, can we remove the mention of app.mlem.state from the Get Started? See https://github.com/iterative/mlem.ai/pull/265#pullrequestreview-1216680523.

Dec 27 '22 03:12 jorgeorpinel

There are different possible cases, let's try to describe them and decide what we should do for each one.

Nothing is deployed yet - we create new deployment
Deployment meta is created but not deployed, params are the same - we deploy it
Same as 2 but params are different - we should change params and deploy
Deployment was not complete with same params - complete deployment
Deployment was not complete with different params - ??? probably error out unless --force?
Deployment already existing with same params - do nothing or re-deploy if different model
Deployment already existing with different params - for now error out. We can make some parameters adjustable like scale for example - let's do it in a separate issue
Something was deployed but removed with mlem - same as 2 and 3 since there should be no state
Something was deployed but removed without mlem - see below
Something was deployed without mlem/in separate project (or state was removed manually which is the same from mlem perspective)- see below

Please suggest what I might forgot

About 9 and 10: Those cases are complex because we actually have 2 different states - what is written to app.mlem.state and the actual state of the app. This is what I call "recreate terraform" - it can sync "local" state with "real" and then compare it with declaration, calculate diff, apply it and thus sync both states and declaration. This logic is very complex and we do not have the manpower to support this. But we might support some cases.

Eg user deployed something to heroku and then wants to deploy different app with the same app name. We either error out, remove old heroku app and re-create a new one or just update the existing one. First approach is the easiest and most robust - we just dont mess with things. Second one is also somehow robust - user will get the same outcome as if the app never existed. However this will corrupt "local" mlem state of the previous deployment - it will still "think" that old model is deployed (idea: add an endpoint to deployments that will return current model meta hash - this way we will know at least that something is off). Also, there will be downtime. Third option - same regarding previous deployment, no downtime, but old settings might "leak" into new app - eg secrets. This might be desirable in some cases (eg dyno type or something).

Initially my approach was that if you use mlem, you should manage app only with mlem - this way "local" state will always be relevant and we won't have problems like this.

And some thoughts on force/safe. I vote for safe being default since we don't version state. If user accidentally runs something and force is default, then old state is gone and everything becomes inconsistent

Feb 13 '23 15:02 mike0sv

@omesser @aguschin

Feb 13 '23 15:02 mike0sv

In general, sounds totally logical to me.

Deployment was not complete with different params - ??? probably error out unless --force?

I think the behavior should be the same if the deployment was complete. That will be easier to understand.

remove old heroku app and re-create a new one sounds ok to me. I also would consider requiring --force here. This doesn't sound "idempotent" like @omesser suggested thou.

Feb 15 '23 12:02 aguschin

After some thinking, let's allow --force option for items 5 and 7.

In the case of 7 (Deployment already existing with different params), let's first do mlem deploy remove and then mlem deploy run. Let's have a warning about this as well (the warning below works for both 5 and 7 IMO):

$ mlem deploy run docker_container dc.mlem -m lyrics2emoji --image_name lyrics2emoji
<deployed>
$ mlem deploy run docker_container dc.mlem -m lyrics2emoji --image_name lyrics2emoji --ports.0 8080:8080
Deployment parameters changed, add flag `--force` to remove the deployment first and then re-create it.
$ mlem deploy run docker_container dc.mlem -m lyrics2emoji --image_name lyrics2emoji --ports.0 8080:8080 --force
<deployed>

Mar 03 '23 03:03 aguschin

Revisiting this discussion - I am (still) convinced a robust idempotent (declarative, if you will 😄 ) command is still needed for a non-frustrating UX. I propose we create 2 commands to respect the "safe" imperative approach as well:

imperative - mlem deploy run - will be safe by default (fail on discrepancies) - but have --force, for convenience.
declarative - mlem deploy apply - will not be safe (not respect local state, same as force by default). will drop-and-recreate if needed - whatever it takes - get to the requested state. [note: run --force == apply, still makes sense to have both as aliases imo, because it's a natural mental progression to use run and then add force]

I'm pretty sure if we have those, in most scenarios (both interactive and automation) apply would be the right thing to use. it's also the command that should also be used in get-started imo. and analytics may reveal what people prefer in the future 😄

re: complexity / "doing a -terraform": No need at all imo. 100% agree to keep it simple! State resolution should be provided by the deployment platforms (e.g. k8s, we should use apply), and when it's not available, we should just spew a warning message, drop and recreate (in apply) or fail with a nice message explaining the discrepancy (in run).

Mar 05 '23 02:03 omesser

Sounds logical to me. Could be an option, but we already have mlem deploy apply that relates to getting predictions from that deployment using data at hand (which is a close relative of mlem apply)

I guess apply is coming from k8s commands? Are there any other decent alternative names here that would make sense?

Mar 06 '23 06:03 aguschin

I understand. Unfortunately I don't think there's a better term or a decent alternative. Maybe the current apply should be an action on data/dataset and not model/deployment? We are applying the data to the model/server not the server. another approach can be to rename apply to apply-data but keep it in its place, or find another name for the apply - maybe ensure? It is an off choice though. Apply seems to be more accurate and familiar imo.

Mar 06 '23 12:03 omesser

I partially implemented this logic in #644

What's left:

mlem deploy apply can be implemented, but if there is no way it can be called something other than apply, then we need to rename current mlem apply commands, eg mlem call or mlem infer. Also, we can try to merge mlem apply and mlem deploy apply - depending on what object we pass as an arg (model vs deployment)
It requires a lot more research/work to support doing something like kubctl apply instead of complete re-deployment here. I will look into this

Mar 16 '23 18:03 mike0sv

Thanks @mike0sv for looking into that! Ok, I understand there's a terminology / UX call to make here, and after giving a proposal above (which is problematic because of naming collision) - we need to make a call here.

Given the positioning of the product - I highly suggest to consider onboarding and smooth "basic" flows (i.e. those should be in get-started docs, those should be with less flags, etc)

@aguschin - Can you please take the lead on driving this to a decision soon? Have you had time to consider this? Maybe you already have a fairly firm stance? or would you like to maybe have a brainstorming meeting with @mike0sv and/or me (or include others of course) - to weigh alternatives and come to a conclusion about this?

Mar 19 '23 13:03 omesser

Gave it some thought, here are the results:

We need to move all logic from cli function to api function, so that cli one just calls api one to make usage the same
We need to add a new overridable function in MlemDeployment, something like redeployor update that will handle the case when we have a different existing deployment. By default it can do the same stuff I implemented in previous PR (remove old and deploy new one from scratch), but it can be customized for k8s for example. Can't say anything about exact signature and logic - I need to actually start implementing it to know for sure. But I can see that it should accept old (or new depending on which object is is called) declaration (and state), maybe a flag like force

Let's decide on UX stuff first so I can start digging further

Mar 20 '23 16:03 mike0sv

Sorry guys, missed this. If the underlying implementation doesn't look too heavy for @mike0sv, I agree we can rename mlem apply to mlem infer (or mlem call) and implement deploy apply like @omesser suggested.

Mar 27 '23 11:03 aguschin

Let's do this and close the issue @mike0sv

Mar 29 '23 11:03 aguschin