temporal-operator icon indicating copy to clipboard operation
temporal-operator copied to clipboard

codify custom search attributes for TemporalNamespace

Open TheHiddenLayer opened this issue 1 year ago • 15 comments

Closes #567

This PR codifies custom search attributes, so one can specify:

apiVersion: temporal.io/v1beta1
kind: TemporalNamespace
metadata:
  name: default
  namespace: demo
spec:
  customSearchAttributes:
    name1: value1
    name2: value2
    name3: value2
  // additional fields

TheHiddenLayer avatar Jul 23 '24 04:07 TheHiddenLayer

TODO(@TheHiddenLayer) add tests?

TheHiddenLayer avatar Jul 23 '24 04:07 TheHiddenLayer

Hi @alexandrevilain,

  1. I'm experiencing various breakages running make test-e2e-dev.
  2. I needed helm and buildx to properly build the container when running make test-e2e-dev. Thinking of updating the local development docs to include those deps. Also, do you prefer the local development docs stay minimal/brief? I noticed there's a lot of commands in the Makefile that aren't mentioned in the local development docs.
  3. Any way you could give me some guidance on what tests you want to see for custom search attributes? This is actually my first time contributing to a k8s operator, and I'm referencing this blog post for the basics.

P.S. regarding:

  1. I'm experiencing various breakages running make test-e2e-dev.

the error is different each time. Here's the error I happened to see this time:

I0724 23:18:56.709046     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 5 milliseconds
I0724 23:18:57.177835     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:57.664057     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:58.167209     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 3 milliseconds
I0724 23:19:06.648034     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 7993 milliseconds
[api-check] The API server is not healthy after 4m0.02803015s

Unfortunately, an error has occurred:
	context deadline exceeded

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.

TheHiddenLayer avatar Jul 24 '24 23:07 TheHiddenLayer

This is running on an M3 apple silicon with colima

fails again, this time with a different error.

docker save temporal-operator > /tmp/temporal-operator.tar
OPERATOR_IMAGE_PATH=/tmp/temporal-operator.tar go test ./tests/e2e -v -timeout 60m -args "--v=4"
I0724 20:24:34.366456   59239 kind.go:157] Creating kind cluster temporal-fcc1c45
I0724 20:24:34.367030   59239 command.go:37] "Found Provider tooling already installed on the machine" command="kind"
I0724 20:24:34.574632   59239 kind.go:171] Launching:kind create cluster --name temporal-fcc1c45
F0724 20:24:35.841364   59239 env.go:375] Setup failure: failed to create kind cluster: exit status 1 : exit status 1: Creating cluster "temporal-fcc1c45" ...
 • Ensuring node image (kindest/node:v1.30.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.30.0) 🖼
 • Preparing nodes 📦   ...
 ✗ Preparing nodes 📦 
Deleted nodes: ["temporal-fcc1c45-control-plane"]
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
FAIL	github.com/alexandrevilain/temporal-operator/tests/e2e	2.034s
FAIL
make: *** [test-e2e-dev] Error 1

TheHiddenLayer avatar Jul 25 '24 03:07 TheHiddenLayer

even while running the make test-e2e-dev on the main branch I get

=== RUN   TestPersistence/cassandra_persistence/Temporal_cluster_ready_after_upgrade#03
panic: test timed out after 1h0m0s

@alexandrevilain are you able to run the whole e3e test suite without any errors or timeouts?

TheHiddenLayer avatar Aug 01 '24 19:08 TheHiddenLayer

Sorry @TheHiddenLayer I'm currently having issues with internet connectivity while I'm in holidays. I can't even pull the cert-manager image ...

alexandrevilain avatar Aug 07 '24 12:08 alexandrevilain

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: https://github.com/alexandrevilain/temporal-operator/pull/777

alexandrevilain avatar Aug 08 '24 19:08 alexandrevilain

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: #777

Nice, I'm glad that it worked. It could be something with my logic then? Pls LMK what you think about this.

TheHiddenLayer avatar Aug 08 '24 23:08 TheHiddenLayer

@alexandrevilain I haven't forgotten about this PR. I'm still planning to finish it off. I’ve been tied up lately with work. Thanks for your patience!

TheHiddenLayer avatar Aug 25 '24 23:08 TheHiddenLayer

Hi @TheHiddenLayer ! Do you need some help for the last few things to fix ?

alexandrevilain avatar Sep 26 '24 19:09 alexandrevilain

hey @alexandrevilain! the feature itself works as expected. I've done manual tests confirming:

  • [x] Add an attr whose name exists with same type -> no change (b/c no reconciliation needed)
  • [x] Add an attr whose name exists with different type -> properly returns error (b/c user first needs to delete existing attr with same name)
  • [x] Add an attr with invalid/gibberish type -> correctly returns parsing error (invalid type)
  • [x] Add an attr with ":" in the name -> adds attr correctly (this tests a potential bug in postgres where key might not work if it has a colon)
  • [x] Add a brand new attr -> adds attr correctly
  • [x] Remove an existing attr -> removes attr correctly

BUT I didn't deeply investigate why this doesn't pass some E2E tests, so any assistance on that would be great.

I wanted to filter out and run a few tests that were failing to isolate my debugging, but the option flags I passed into the test command didn't do anything. So, the fact that I had to run the entire E2E pipeline (30-50min on my computer) has made this arduous to debug.

TheHiddenLayer avatar Sep 26 '24 20:09 TheHiddenLayer

I'll clean up the comments/dead code/debugging log statements

TheHiddenLayer avatar Oct 01 '24 17:10 TheHiddenLayer

Hey guys I realise last update on this was quite a while ago but I am rather keen to this merged. Is there something I can do to help wrap this up?

AdamSendible avatar Jun 12 '25 11:06 AdamSendible

@AdamSendible if you can get the e2e tests to work it should be merge-able :)

I wasn't able to get it to work

TheHiddenLayer avatar Jun 12 '25 21:06 TheHiddenLayer