tanka icon indicating copy to clipboard operation
tanka copied to clipboard

Problem with label `tanka.dev/environment` hash

Open jordiclariana opened this issue 1 year ago • 6 comments

Hi,

Playing around with some projects, I end up detecting that the label tanka.dev/environment (which is used to identify resources for an environment so they can be pruned if necessary) can have collisions in the case that you deploy different applications (in, for instance, different repositories) using the same "tanka environment". Example:

repo1
└── environments
    └── myenv
        ├── main.jsonnet
        └── spec.json
repo2
└── environments
    └── myenv
        ├── main.jsonnet
        └── spec.json

The objective is to be able to deploy several applications into the same Kubernetes cluster, in a multi/microservice stack.

When running tk apply, both of those repositories produce the same tanka.dev/environment label's value: cc3fe2eceb1805bc3b5cbeed010b0bd38c488b156d199a75.

How this value is produced: Looking at the code, I see this, which at first glance made me believe that it was using .metadata.name and .metadata.namespace from the spec.json file. But a more in-depth look to the code shows that nothing from spec.json is used. Here's where those name and namespace values come from:

  • Name: relative path from rootDir
  • Namespace: relative path from rootDir for main.jsonnet file in the env dir

So, the hash comes from a sha256 of environments/myenv:environments/myenv/main.jsonnet, which is exactly the same for both projects/repos.

So, whenever repo1 is tk applied, it will delete repo2 resources and the other way around.

I understand the rationality behind how the hash is calculated (if same dir and file under environments, then same environment), but it is at least misleading (why the use of "metadata" to store this? Why name and namespace as property names for it?) and not flexible enough (according to my use case, which might not be common or part of the scope of Tanka).

So this is what I would propose:

  • Not to break compatibility, the default way of calculating the label hash should stay as it is right now
  • Allow changing how the label hash is calculated with a flag on the spec.json file (under "spec"). Example of the proposal:
{
  "apiVersion": "tanka.dev/v1alpha1",
  "kind": "Environment",
  "metadata": {
    "name": "myenv",
    "labels": {
      "project": "myproject"
    }
  },
  "spec": {
    "apiServer": "https://127.0.0.1:6443",
    "namespace": "mynamespace",
    "injectLabels": true,
    "tankaEnvLabelFromFields": [
      ".metadata.name",
      ".spec.namespace",
    ]
  }
}

So this new property, spec.tankaEnvLabelFromFields can be a list of fields' paths from the same spec.json file that the code can concatenate (in order) and produce the hash.

I would like to know maintainers' opinion about it and if it is something you might consider if I work on a PR.

Cheers

jordiclariana avatar Mar 15 '23 13:03 jordiclariana

I think the tankaEnvLabelFromFields setting is a great idea, personally. If you do work on that PR, I'd be happy to review/approve that

julienduchesne avatar Apr 13 '23 17:04 julienduchesne

I'm keen to pick this up unless anyone has already started work on it as we're hitting this issue frequently.

DeanBruntThirdfort avatar Nov 30 '23 00:11 DeanBruntThirdfort

Hey @DeanBruntThirdfort , sorry I have been radio-silent here. I really wanted to work on this but life and work are not allowing me to. Feel free to work on it, and thank you for the offer! :pray:

jordiclariana avatar Dec 04 '23 08:12 jordiclariana

Raised a PR here to add this support (pending maintainers being happy with the approach etc).

DeanBruntThirdfort avatar Dec 15 '23 17:12 DeanBruntThirdfort

I had same issue today. my workaround is to add an unique project folder

like this


├── repo1
│   └── environments
│       └── alert-rules
│           ├── dev
│           └── prod


├── repo2
│   └── environments
│       └── loki
│           ├── dev
│           └── prod

instead of this

├── repo1
│   └── environments
│       ├── dev
│       └── prod

├── repo2
│   └── environments
│       ├── dev
│       └── prod

seems to work.

kingindanord avatar Feb 06 '24 11:02 kingindanord

I also went down the same rabbit hole and discovered that the hash is computed from the environment name only, e.g. sha256("environments/default:environments/default/main.jsonnet"). I only noticed this after checking that different applications had different hashes (and they didn't). It is incredibly easy for this to collide!

Some guidance should probably be added to the docs about either making the name of an environment unique, or maybe renaming the main.jsonnet file to something more application-specific, etc. Or adding a new field to spec.json that will be used in the hash (this could even be backwards-compatible).

The approach in #975 would also work but I think it should be accompanied with better defaults, e.g. tk init should generate something that has a unique ID somewhere that makes it into the hash (project folder name, UUID, whatever).

remram44 avatar Feb 09 '24 21:02 remram44