tracetest icon indicating copy to clipboard operation
tracetest copied to clipboard

Improvements / New Features are causing CLI / Server versioning issues

Open kdhamric opened this issue 6 months ago • 14 comments

Scope: This is an investigation thread to decide what we would like to do to help this issue. Will decide a path and pick timing of execution after we reach consensus on a plan.

As an existing user, I have a local version of the Tracetest CLI running the agent that I depend on. As changes happen to the Commercial offering at the server level, They may required an update to my local CLI / agent. Currently, I am not aware that an upgrade is needed and it can cause issues with particular capabilities or prevent the use of the CLI or the Agent in general.

We may want to consider:

  • Particular new capability that will not work without a local upgrade (ie scope of impact limited to a confined area)
  • Extensive change that requires a new version.

kdhamric avatar Jan 25 '24 17:01 kdhamric

I can think of the follwing options to approach this problem:

Rely on SemVer

SemVer defines what each part of the version string means. If a version is composed as v[Mayor].[Minor].[patch], patch could be "optional" bugfixes, minor could be "optional" new features, and "mayor" mandatory upgrades, including backwards incompatible changes, critical bugfixes or feature improvements, etc.

At the initial handshake between CLI and server (a.k.a when the CLI asks for the version.json file from the target server) the CLI gets the server version so it can compare it with it's own version, and based on the previously defined parameters decide if an upgrade is requried.

Examples:

  • Server Version: 0.15.4 CLI Versions: 0.15.3 Result: Upgrade optional, allow all operations, show notice to the user: "optional bugfix available"

  • Server Version: 0.15.4 CLI Versions: 0.14.8 Result: Upgrade optional, allow all operations, show warning to the user: "New features availalbe"

  • Server Version: 1.0.1 CLI Versions: 0.19.2 Result: Upgrade required, don't allow any operations, show ERROR to the user: "Required upgrade available"

The implementation is limited, available only on the CLI side, and doesn't allow further explanation to the user. However, it's very simple to implement.

Server defined update guide.

In this approach, we can rely on the initial handshake between CLI and server (a.k.a when the CLI asks for the version.json file from the target server) as a way for the server to decide the update criterias, and then have the CLI inform the user of required steps. For example:

// version.json

{
  "version":"v0.15.4",
  // ...
  "updateInstructions": [
    "<0.15.3": {
      "type": "optional",
      "reason": "enables wizard OTLP check"
    }
    "<0.15.2": {
      "type": "required",
      "reason": "fixes critical bug when running tests"
    }
  ]
}

keys could be version checks (see https://devhints.io/semver) to allow flexibility in defining the matching versions.

This approach allows for a fine grain update criteria definition, and lets us send a user friendly message explaining what the upgrade fixes, and if the update is required or optional.

As an nice feature that can be added at a latter stage, and since the information is on the server, we could use the CLI-Server handshake to also track CLI version on the server, and show the user relevant warnings on the UI. This might get complex when multiple users use the CLI on a single environment, but seems feasible for an MVP.


I would personally go with the server based option unless we decide we need a fix ASAP

cc @kdhamric @xoscar thoughts?

schoren avatar Jan 29 '24 18:01 schoren

I think this is good, but I am not sure if it addresses the issue we saw last week that prompted the creation of the ticket, which falls more in the "Particular new capability that will not work without a local upgrade (ie scope of impact limited to a confined area)". The issue was the addition of a specific new mode for the tracetest agent where it could handle a request for 'verifying the OTLP Tracing'. Trying to think how a user that got to this screen in the wizard would realize that one function is not available, rather than getting stuck. From their perspective, they would see the screen saying it was waiting to see tracing information, while in reality it will never see any tracing info as the call is not implemented in the agent.

kdhamric avatar Jan 29 '24 22:01 kdhamric

Very nice @schoren I like the way you outlined the problem and talked about the possible solutions, I see cons on the server version, which would be us being on top of that when it comes to maintaining the version.json file to always keep it up to date and matching the proper functionality, the semver looks simpler to implement on that regard.

Based on @kdhamric comment, I believe that we should also have something to validate that scenario, how do I go from my current working setup to being noticed that I need to update and then to update it without hassles.

I believe that requirement needs to be more fleshed out, so we understand what we want the user experience to be across both scenarios.

xoscar avatar Jan 29 '24 22:01 xoscar

My suggestion included this:

As an nice feature that can be added at a latter stage, and since the information is on the server, we could use the CLI-Server handshake to also track CLI version on the server, and show the user relevant warnings on the UI.

@kdhamric the functionality that you describe, if I understand correctly, is that. In the case that this is the main use case, we can invert the priorities I described and be:

The main functionality is in the UI. When a user wants to use a functionality known to require a min CLI version, we can leverage the Server-CLI handshake to get what CLI version is being used, and if an upgrade is required, we can notify the user in the UI.

As an nice feature that can be added at a latter stage, and given that we already have some kind of mapping between functionalities and minimal versions, we can transmit that information to the CLI and show relevant warnings to the user in the terminal, if this use case comes in need.

Do you think this covers the case you described or is it still missing the point?


@xoscar I can't think of a way without manually maintaining some kind of list that maps min CLI versions with functionalities. It can be a separated JSON file, migrations in the DB, but somehow we need to have a system of identifying that X function requires Y cli version. Do you have any other ideas? Or maybe a more interesting question is, what problems do you see with maintaining a manual list?

schoren avatar Jan 29 '24 23:01 schoren

@schoren Thanks for the clarification and highlighting the 'As an nice feature that can...' - I did not absorb that the first time.

Not trying to decide which is the priorities yet - just wanted a ticket to flesh out the issue and come up with approaches we could take to address it... which you are doing. Thanks!

kdhamric avatar Jan 30 '24 02:01 kdhamric

@danielbdias @mathnogueira @jorgeepc would you mind taking a look at this? any extra feedback is valuable!

schoren avatar Jan 31 '24 13:01 schoren

Was talking yesterday to somebody (@xoscar I think), and one thought we had in dealing with the mismatch between old versions of the agent and the current version of app.tracetest.io was to provide visibility to the current agent version on the settings screen (particularly the agent page).

Documenting this here... as we start to decide 'ok, lets do X', we may want to include this.

kdhamric avatar Jan 31 '24 17:01 kdhamric

@schoren I was thinking that instead of marking methods or features, we could leverage the ping method, which would return the version from the control plane to the agent, that way we can identify if there is a mismatch and take the proper actions afterward. It would involve moving to a better handling of versioning as you mentioned in a previous comment, but I think it could work

xoscar avatar Feb 01 '24 15:02 xoscar

Hi folks, just complementing to reinforce this problem: working with Julianne yesterday, we had issues connecting to SaaS due to an old version of Tracetest CLI (sometimes newcomers don't have any clue what is happening).

danielbdias avatar Feb 01 '24 15:02 danielbdias

Hey guys. Just a note, if we rely on the agent ping method, that will require the agent to be running. Probably it's better to have it in the initial handshake as mentioned by @schoren .

jorgeepc avatar Feb 01 '24 15:02 jorgeepc

About the issue: I like the idea of relying on SemVer and emitting info/warning messages depending on the version.

The only thing we need to be aware of is to do the correct updates on a version to avoid user problems, like releasing a patch when we should release a minor version, for instance.

Should we think about regression testing for our CLI because of that? I believe doing minor changes on CLI e2e tests could help us with that in the future.

danielbdias avatar Feb 01 '24 15:02 danielbdias

@jorgeepc @schoren I think having it at the handshake level would be good for new agents, but for long-running ones we also need it, maybe adding it to both?

xoscar avatar Feb 01 '24 16:02 xoscar

@danielbdias I agree, this would have meant having a more organizational change, planning the releases better, and understanding the consequences

xoscar avatar Feb 01 '24 16:02 xoscar

@xoscar about ping vs handshake: The way I see it, as long as the agent is running, it will run the same version from start to finish, no matter how many hours it's running. All the ping will contain the same version, until the agent is restarted, which will issue a new handshake.

I do imagine this scenario: the cloud-api/control-plane is restarted, and the new version has a new feature that might require agent upgrade. In that case, we might have a separated stream so the control plane can send "notifications" to the agent. So when the control plane starts, it sends a message to all connected agents about the new required version. Each agent knows its version and can show the warning if needed. The fact that it's a stream makes it possible to send update notifications at any time.

Let me dig into this idea further.

Showing warnings on the agent

Goal of this proposal

The point of this is to inform users about exactly why do they need to update. Users should be able to see this notification in the Web UI as well as in the Agent UX/logs.

FeatureVersionMap

This structure maps features to versions. This allows to show users which specific features they won't support if outdated. For example:

{
  "v0.15.5": [
    "support OTELCollector Check",
    "Fix critical bug when running tests'
  ],
  "v0.15.2": [
    "Support SomeNewDataStore"
  ]
//...
}

Keeping track of features

We need to manually maintain a list of feature to version map. It can be as easy as a JSON file that gets parsed by the cloud-api to keep data in memory and respond to queries or notify agents.

This json could look something like this:

{
  "feature_id": "min_version",
   "wizard_otel_connectivity_check": "v0.15.4",
   "some_new_datastore": "v0.15.2",
   // ...
}

Informing the user on the Agent UX

Whenever an agent gets a new FeatureVersionMap, it can compare it to its own version, and show the users what features they won't support until updated. Based on the example before, imagine an agent running v0.15.1, it can show the following warning to the user:

Hey! We have new features that require an agent update:
- support OTELCollector Check
- Fix critical bug when running tests
-   Support SomeNewDataStore

Another agent running v0.15.2 gets the same info and show a different message:

Hey! We have new features that require an agent update:
- support OTELCollector Check
- Fix critical bug when running tests

An agent running v0.17.0 won't show anything.

Example Flow 1: Agent starts

sequenceDiagram
    participant Agent
    participant ControlPlane

    Agent->>ControlPlane: Startup, Opens streams
    ControlPlane->>Agent: Send FeatureVersionMap
    alt Agent Doesn't support features
        Agent->>User: shows warning
    end

Example Flow 2: Running Agent, control plane gets an update

sequenceDiagram
    participant Agent
    participant ControlPlane

    Agent->>ControlPlane: Startup, opens streams
    Note right of Agent: normal operations
    ControlPlane->>ControlPlane: gets new version info
    ControlPlane->>Agent: sends updated FeatureVersionMap
    alt Agent Requires Update
        Agent->>User: shows warning
    end

Showing warnings on the Web GUI

We are already maintaining a map of features -> versions. We can create unique IDs for the features, and then at arbitrary points in the web UI run checks to see if that feature, identified by it's unique ID can be supported by the running (or last known if none running) agent version.

Example: Wizard

Suppose we have the "support otel connectivity check" feature, its Unique ID could be "wizard_otel_connectivity_check".

In the GUI, we could have something like this:

# or whatever this looks like in TypeScript

supported, agentVersion, requiredVersion := agentSupportsFeature("wizard_otel_connectivity_check)
if !supported {
  showWarning("This features requires the agent to be running {{requiredVersion}}, but it seems your agent is on {{agentVersion}}, please update to support this feature")
}

// show wizard

This imaginary agentSupportsFeature function would be a request to the server that could look like this:

GET /{{orgID}}/{{envID}}/featureCheck?feature=wizard_otel_connectivity_check

Response:
{
"supported": false,
"lastKnownVersion": "v0.15.2",
"requiredVersion": "v0.15.4"
}

Keeping track of agent versions

On every handshake, we can make the agents send their version. We can save that version in the same place we save the online/offline state of agents.

schoren avatar Feb 01 '24 19:02 schoren