robusta
robusta copied to clipboard
robusta-runner crashes if clusterName is read from env var in values.yaml
Describe the bug I have to set the clusterName value in values.yaml file explicitly. If I read the value from an environment variable the runner crashes.
Using environment variables work in other parts of the values.yaml so it should work here too.
To Reproduce If installed using values_error.yaml it will fail. If installed using values_success.yaml it will succeed.
values_error.yaml:
clusterName: "{{ env.CLUSTER_NAME }}"
customPlaybooks:
- triggers:
- on_pod_crash_loop: {}
- on_pod_oom_killed: {}
- on_container_oom_killed: {}
# - on_deployment_update: {}
actions:
- resource_babysitter: {}
sinks:
- slack
globalConfig:
signing_key: "{{ env.SIGNING_KEY }}"
account_id: "{{ env.ACCOUNT_ID }}"
prometheus_url: "http://prometheus.ops.svc.cluster.local:80"
sinksConfig:
- slack_sink:
name: main_slack_sink
slack_channel: alerts
api_key: "{{ env.SLACK_SINK_API_KEY }}"
- robusta_sink:
name: robusta_ui_sink
token: "{{ env.ROBUSTA_SINK_TOKEN }}"
ttl_hours: 4380
enablePlatformPlaybooks: true
runner:
resources:
requests:
cpu: 100m
memory: 800Mi
limits:
memory: 800Mi
sendAdditionalTelemetry: false
additional_env_vars:
- name: CLUSTER_NAME
valueFrom:
configMapKeyRef:
name: cluster-config
key: cluster-name
- name: ACCOUNT_ID
valueFrom:
secretKeyRef:
name: robusta-secret
key: account_id
- name: SIGNING_KEY
valueFrom:
secretKeyRef:
name: robusta-secret
key: signing_key
- name: SLACK_SINK_API_KEY
valueFrom:
secretKeyRef:
name: robusta-secret
key: slack_sink_api_key
- name: ROBUSTA_SINK_TOKEN
valueFrom:
secretKeyRef:
name: robusta-secret
key: robusta_sink_token
values_success.yaml:
clusterName: dev-cluster
customPlaybooks:
- triggers:
- on_pod_crash_loop: {}
- on_pod_oom_killed: {}
- on_container_oom_killed: {}
# - on_deployment_update: {}
actions:
- resource_babysitter: {}
sinks:
- slack
globalConfig:
signing_key: "{{ env.SIGNING_KEY }}"
account_id: "{{ env.ACCOUNT_ID }}"
prometheus_url: "http://prometheus.ops.svc.cluster.local:80"
sinksConfig:
- slack_sink:
name: main_slack_sink
slack_channel: cluster-alerts
api_key: "{{ env.SLACK_SINK_API_KEY }}"
- robusta_sink:
name: robusta_ui_sink
token: "{{ env.ROBUSTA_SINK_TOKEN }}"
enablePlatformPlaybooks: true
runner:
resources:
requests:
cpu: 250m
memory: 1024Mi
limits:
memory: 1024Mi
sendAdditionalTelemetry: false
additional_env_vars:
- name: ACCOUNT_ID
valueFrom:
secretKeyRef:
name: robusta-secret
key: account_id
- name: SIGNING_KEY
valueFrom:
secretKeyRef:
name: robusta-secret
key: signing_key
- name: SLACK_SINK_API_KEY
valueFrom:
secretKeyRef:
name: robusta-secret
key: slack_sink_api_key
- name: ROBUSTA_SINK_TOKEN
valueFrom:
secretKeyRef:
name: robusta-secret
key: robusta_sink_token
Logs From runner:
setting up colored logging
[32m2023-12-13 09:50:56.847 INFO logger initialized using INFO log level[0m
[32m2023-12-13 09:50:56.847 INFO Creating hikaru monkey patches[0m
[32m2023-12-13 09:50:56.847 INFO Creating yaml monkey patch[0m
[32m2023-12-13 09:50:56.848 INFO Creating kubernetes ContainerImage monkey patch[0m
[32m2023-12-13 09:50:56.849 INFO watching dir /etc/robusta/playbooks/ for custom playbooks changes[0m
[32m2023-12-13 09:50:56.865 INFO watching dir /etc/robusta/config/active_playbooks.yaml for custom playbooks changes[0m
[32m2023-12-13 09:50:56.865 INFO Reloading playbook packages due to change on initialization[0m
[32m2023-12-13 09:50:56.865 INFO loading config /etc/robusta/config/active_playbooks.yaml[0m
[31m2023-12-13 09:50:56.962 ERROR unknown error reloading playbooks. will try again when they next change
Traceback (most recent call last):
File "/app/src/robusta/runner/config_loader.py", line 159, in __reload_playbook_packages
runner_config = self.__load_runner_config(self.config_file_path)
File "/app/src/robusta/runner/config_loader.py", line 276, in __load_runner_config
yaml_content = yaml.safe_load(file)
File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
value = self.construct_mapping(node)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 141, in construct_mapping
raise ConstructorError("while constructing a mapping", node.start_mark,
yaml.constructor.ConstructorError: while constructing a mapping
in "/etc/robusta/config/active_playbooks.yaml", line 14, column 17
found unhashable key
in "/etc/robusta/config/active_playbooks.yaml", line 14, column 18[0m
[32m2023-12-13 09:50:56.966 INFO Initialized task queue: 20 workers. Max size 500[0m
[32m2023-12-13 09:50:56.982 INFO Initialized task queue: 20 workers. Max size 500[0m
[32m2023-12-13 09:50:57.239 INFO Setting cluster active to True[0m
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/app/src/robusta/runner/main.py", line 51, in <module>
main()
File "/app/src/robusta/runner/main.py", line 45, in main
event_handler.set_cluster_active(True)
File "/app/src/robusta/core/playbooks/playbooks_event_handler_impl.py", line 338, in set_cluster_active
for sink in self.registry.get_sinks().get_all().values():
AttributeError: 'NoneType' object has no attribute 'get_all'
Exception in thread fs-watcher:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Exception in thread fs-watcher:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/app/src/robusta/utils/file_system_watcher.py", line 27, in fs_watch
self.run()
File "/usr/local/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/app/src/robusta/utils/file_system_watcher.py", line 27, in fs_watch
for _ in watch(self.path_to_watch, stop_event=self.stop_event):
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 38, in watch
yield loop.run_until_complete(_awatch.__anext__())
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 121, in __anext__
for _ in watch(self.path_to_watch, stop_event=self.stop_event):
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 38, in watch
yield loop.run_until_complete(_awatch.__anext__())
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 121, in __anext__
new_changes = await self.run_in_executor(watcher.check)
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 142, in run_in_executor
return await self._loop.run_in_executor(self._executor, func, *args)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 819, in run_in_executor
executor.submit(func, *args), loop=self)
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
new_changes = await self.run_in_executor(watcher.check)
File "/usr/local/lib/python3.9/site-packages/watchgod/main.py", line 142, in run_in_executor
return await self._loop.run_in_executor(self._executor, func, *args)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 819, in run_in_executor
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
executor.submit(func, *args), loop=self)
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
[32m2023-12-13 09:52:26.670 INFO SIGINT handler called[0m
[32m2023-12-13 09:52:26.670 INFO Setting cluster active to False[0m
Exception ignored in: <module 'threading' from '/usr/local/lib/python3.9/threading.py'>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 1477, in _shutdown
lock.acquire()
File "/app/src/robusta/core/playbooks/playbooks_event_handler_impl.py", line 351, in handle_sigint
self.set_cluster_active(False)
File "/app/src/robusta/core/playbooks/playbooks_event_handler_impl.py", line 338, in set_cluster_active
for sink in self.registry.get_sinks().get_all().values():
AttributeError: 'NoneType' object has no attribute 'get_all'
Expected behavior I should be able to set any value via environment variable.
Additional context I have robusta running in various clusters and want to be able to share as much as possible. Fully supporting environment variables in the values.yaml gives the opportunity to configure everything cluster specific in a tiny config_map.
Hey, it's not supported today. What's your motivation to do it via a tiny ConfigMap as opposed to a per-cluster Helm override value? (And are you installing w/ Flux or ArgoCD?)
I'dlove to understand the use case a little more.
Hey @aantn, thanks for your reply. We have several edge clusters all running the same apps. We use argo cd for managing the clusters. Each edge cluster contains a single kustomize file referring the template cluster and a cluster specific config map. The config map defines some cluster specific variables like the cluster name. As all of the clusters when deploying contain a config map with the same name, we can easily reference the variables from the config map. In this case it would be easy to load the value as environment variable. We do the same e.g. with prometheus for external labels.
Any plans on supporting this soonish?
Sorry, no update on this yet. Is this a blocker for your adoption?
It would make the setup ways easier. As described above it reduces the cluster setup of the cloned clusters quite a lot.