JamesMurkin
JamesMurkin
Currently CancelJobSet event messages do not contain the user id of the user who perform the cancellation. That means cancellation events populate the user id with the queue name. It...
A new RPC endpoint was added to Armada `CancelJobSet` in https://github.com/G-Research/armada/pull/1312 We should update the python client to include (and possibly use) this new endpoint
**Problem** Pods can go to Pending and then get stuck forever - **seems to only happens if the node is bad** **Cause** We typically detect stuck pods and retry them...
Currently we only generate a swagger.json that we expose on our apis. However it'd be nice if we also exposed a full swagger UI - https://swagger.io/tools/swagger-ui/ So that users can...
PodCache now requires a key generator function - Rather than it being assumed it is `ExtractPodKey` Swapping the podsToDelete cache key function to be based on pod uid and default...
**Problem** Currently the executor reports pod usage to the server as simply the pod request values. This restricts use to: - Pod request = Pod limit If request != limit,...
Currently if you submit an Service/Ingress type which is an int but invalid, it just uses the first type I.e if you say I want ServiceType with value 100, you...
https://golangci-lint.run/ This tool is a metalinter. Running it locally shows up quite a few things we should fix: - Unused code - Unused variables It would be worth putting this...
### What happened? I ran `while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024 | etcdctl put key || break; done` from 2 machines against my etcd cluster and it...
This test just confirms pulsarBatchSize is limit the number of events we process at once correctly