anax icon indicating copy to clipboard operation
anax copied to clipboard

Already deployed containers should not be stopped when node property is changed

Open SanjeevKGupta opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe.

User is finding that when a new property is added in the node policy to deploy an additional service container, all the previously running agreements are cancelled, that result into already running containers getting stopped. New agreements are formed and they start all over again.

Describe the solution you'd like.

Expectation is that already running containers should not be stopped and only newly added container should be started.

Describe alternatives you've considered

No response

Additional context.

This is important when user selectively wants to deploy additional application later on by adding/removing the node policy properties.

SanjeevKGupta avatar Jul 21 '22 12:07 SanjeevKGupta

Fully agree, agreements should be checked if they are still valid, before canceling it and waiting till a new agreement is formed.

If there are a lot of containers running which possibly take some time to start this can result into a long service disruption on the device.

mkeppeler avatar Jul 21 '22 12:07 mkeppeler

I would be very interested in this enhancement. I have edge nodes running multiple agreements. A change to any single agreement stops all the containers/agreements. In some cases this is an outage (several minutes) of the main purpose of the edge node.

johnwalicki avatar Jul 21 '22 12:07 johnwalicki

@dabooz @bmpotter or @rhaidou could you please weigh in on this. Is it possible to hold off on stopping existing running services to first check and see if:

  • Any re-negotiation will result in starting the container, and thus the running container should not be stopped
  • That the inputs (env variables, parameters, policies and definitions, and secrets) have not changed since the last time it was run

The hope here is that publishing a new service definition and deployment policy (Service B), thus triggering a re-negotiation, will not result in re-starting all existing running services (Service A) if they do not otherwise need to be re-started.

joewxboy avatar Jul 21 '22 12:07 joewxboy

See also: https://github.com/open-horizon/anax/issues/957

TheMosquito avatar Jul 21 '22 13:07 TheMosquito

A summary of internal discussion for reference.

SG - This came up while working on an important customer use case.

LG - I think it is a very reasonable request from user point of view. But currently it is not the way the agent/agbot is working. I am not sure if there is an easy fix without investigation.

DB - This is something that we have always wanted to change but there has never been a requirement for it (because it's not a easy feature to implement, we have never tried to just tuck it into a release).

DB - Now, to be clear, there are two variants to this feature:

  1. changing a node, service or deployment property that is relevant/referenced in the determination of deployment (i.e. that is referenced in a node, service or dep policy constraint expression) of a running workload/service.
  2. changing a node service or deployment property that is not relevant/refernced in the determination of deployment (i.e. that is NOT referenced in a node, service or dep policy constraint expression) of a running workload/service.

2 is easier to fix than 1 . For 2, we know we dont have to touch the agreement, for 1 we MIGHT have to cycle the agreement.

I bring this up because I am hoping that we only have to support 2 and not 1. But, I dont know the customer's use case. The question we need to understand is if the user tends to use seperate properties for deployment of separate services or if there is an intersection between the properties used for service deployment.

LG - Right. In Ts-And-Cs in the agreement, the node policy is included as a whole. It will be very hard to determine what parts have been changed and what parts play a role in an agreement.

DB - The reason that 1 is hard to fix is this: IF the user changed a property that is relevant/referenced within one of the constraint expressions BUT the change does NOT invalidate the agreement (e.g. a constraint says propertyX < 5 and propertyX changed from 4 to 3) THEN we need to keep the agreement but we need to update the TsAndCs of the agreement on both the agent and the agbot side to properly reflect the state of the system when the agreement was made.

I introduced an agreement protocol extension for updating agreements in place when we added secrets, but I dont remember if that extension goes far enough to cover this case as well. I did have it in mind, but of course time was limited as always.

JW - Could the agent perform the Ts and Cs logic before terminating the running agreements? If the agreements are already running and the modified node properties do not affect the agreement, let them continue to run.

DB - yes, that's the easy part. The hard part, as I said above, is getting the agreement updated on both sides (agbot and agent) because both have to agree on the change. agreements have always been a once and done thing. conceptually, updating an agreement is something that makes sense, humans do it all the time. we just dont have it in the code.

SG - Looks like both number 1 and 2 are needed. As user may want to start a container as well as remove already running one. Also may be add/remove a list of them

SanjeevKGupta avatar Aug 01 '22 16:08 SanjeevKGupta

@SanjeevKGupta Agreed, we need both 1 and 2 in our current use case.

To clarify: Our current use case revolves around setting/unsetting node properties which in turn do either install or uninstall containers based on deployment policy constraints. At this moment these properties are simply being true or false, but in the future it might even change to be more complex properties / multiple properties (e.g. opt-in to receive beta channel services).

s-renz avatar Aug 02 '22 12:08 s-renz