ankaios
ankaios copied to clipboard
Invalid Pod spec leads to config volume never being deleted
Current Behavior
If one of the resources in the manifest
part of a podman-kube workload is invalid,
this can lead to podman kube play
and podman kube down
to fail.
As during the creation of the workload, the config volume is written first and
this volume is not deleted if the deletion of the workloads fails,
the volume will never be deleted.
Each time the Ankaios agents starts, it sees the volume, and tries to delete this workload again.
After a restart of the Ankaios agent, it is also confuses the workload instance name of the incorrect and the correct workload and
deletes the correct workload.
Expected Behavior
After deleting a podman-kube workload and this workload not having any Podman resources anymore,
the Ankaios agent shall delete the config volume of this workload, even if podman kube down
fails.
The Ankaios agent shall also be able the handle multiple existing workloads instances with the same workload name and only delete the not needed workload instances.
Steps to Reproduce
- Start ank-server with the startup state given below
- Start ank-agent
- Stop ank-agent and ank-server
- Fix the error in the startup state, by setting the
apiVersion
tov1
- Start ank-server
- Start ank-agent
- Stop and start ank-agent
- Use
podman volume ls
to see, the config volume for the nginx workload is not deleted. - Look at the log output of ank-agent during the third start. The ank-agent sees two reusalbe workload, also once should already be deleted. It will also fail to delete first workload again.
state.yml:
workloads:
nginx:
runtime: podman-kube
agent: agent_A
restart: true
updateStrategy: AT_MOST_ONCE
accessRights:
allow: []
deny: []
tags: []
runtimeConfig: |
manifest: |
apiVersion: 1
kind: Pod
metadata:
name: nginx
spec:
restartPolicy: Never
containers:
- name: server
image: docker.io/nginx:latest
ports:
- containerPort: 80
hostPort: 8080
Context (Environment)
Tested inside LXD containers running Arch Linux and Ubuntu 23.
This also interacts with an Podman error, as a failing podman kube play
can leave already created resource (Podman #17434).
Logs
ank-agent first start:
[2023-11-29T13:04:05Z DEBUG ank_agent] Starting the Ankaios agent with
name: 'agent_A',
server url: 'http://127.0.0.1:25551/',
run directory: '/tmp/ankaios/'
[2023-11-29T13:04:05Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io"'
[2023-11-29T13:04:05Z INFO ank_agent::agent_manager] Starting ...
[2023-11-29T13:04:05Z DEBUG ank_agent::agent_manager] Start listening to server.
[2023-11-29T13:04:05Z DEBUG grpc::client] gRPC Communication Client starts.
[2023-11-29T13:04:05Z TRACE grpc::execution_command_proxy] RESPONSE=ExecutionRequest { execution_request_enum: Some(UpdateWorkload(UpdateWorkload { added_workloads: [AddedWorkload { name: "nginx", runtime: "podman-kube", dependencies: {},
restart: true, update_strategy: AtMostOnce, access_rights: None, tags: [], runtime_config: "manifest: |\n apiVersion: 1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n
image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }], deleted_workloads: [] })) }
[2023-11-29T13:04:05Z DEBUG ank_agent::agent_manager] Agent 'agent_A' received UpdateWorkload:
Added workloads: [WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config:
"manifest: |\n apiVersion: 1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }]
Deleted workloads: []
[2023-11-29T13:04:05Z INFO ank_agent::runtime_manager] Received a new desired state with '1' added and '0' deleted workloads.
[2023-11-29T13:04:05Z DEBUG ank_agent::runtime_manager] Handling initial workload list.
[2023-11-29T13:04:05Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman-kube' workloads on agent 'agent_A'.
[2023-11-29T13:04:06Z INFO ank_agent::runtime_manager] Found '0' reusable 'podman-kube' workload(s).
[2023-11-29T13:04:06Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman' workloads on agent 'agent_A'.
[2023-11-29T13:04:06Z TRACE ank_agent::runtime_connectors::podman_cli] Listing workload names for: 'agent'='agent_A'
[2023-11-29T13:04:06Z DEBUG ank_agent::runtime_connectors::podman::podman_runtime] Found 0 reusable workload(s): '[]'
[2023-11-29T13:04:06Z INFO ank_agent::runtime_manager] Found '0' reusable 'podman' workload(s).
[2023-11-29T13:04:06Z DEBUG ank_agent::runtime_manager] Creating control interface pipes for 'WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config: "manifest: |\n apiVersion: 1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }'
[2023-11-29T13:04:06Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io/nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589"'
[2023-11-29T13:04:06Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589/input"'
[2023-11-29T13:04:06Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589/output"'
[2023-11-29T13:04:06Z INFO ank_agent::runtime_connectors::runtime_facade] Creating 'podman-kube' workload 'nginx' on agent 'agent_A'
[2023-11-29T13:04:06Z WARN ank_agent::runtime_connectors::runtime_facade] Failed to create workload: 'nginx': 'Could not create workload: 'Execution of command failed: Error: unable to read YAML as Kube Pod: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal number into Go struct field Pod.apiVersion of type string
''
podman volume ls after first start:
DRIVER VOLUME NAME
local nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A.config
ank-agent seconds start:
[2023-11-29T13:04:21Z DEBUG ank_agent] Starting the Ankaios agent with
name: 'agent_A',
server url: 'http://127.0.0.1:25551/',
run directory: '/tmp/ankaios/'
[2023-11-29T13:04:21Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io"'
[2023-11-29T13:04:21Z DEBUG grpc::client] gRPC Communication Client starts.
[2023-11-29T13:04:21Z INFO ank_agent::agent_manager] Starting ...
[2023-11-29T13:04:21Z DEBUG ank_agent::agent_manager] Start listening to server.
[2023-11-29T13:04:21Z TRACE grpc::execution_command_proxy] RESPONSE=ExecutionRequest { execution_request_enum: Some(UpdateWorkload(UpdateWorkload { added_workloads: [AddedWorkload { name: "nginx", runtime: "podman-kube", dependencies: {},
restart: true, update_strategy: AtMostOnce, access_rights: None, tags: [], runtime_config: "manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n
image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }], deleted_workloads: [] })) }
[2023-11-29T13:04:21Z DEBUG ank_agent::agent_manager] Agent 'agent_A' received UpdateWorkload:
Added workloads: [WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config:
"manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }]
Deleted workloads: []
[2023-11-29T13:04:21Z INFO ank_agent::runtime_manager] Received a new desired state with '1' added and '0' deleted workloads.
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_manager] Handling initial workload list.
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman' workloads on agent 'agent_A'.
[2023-11-29T13:04:21Z TRACE ank_agent::runtime_connectors::podman_cli] Listing workload names for: 'agent'='agent_A'
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::podman::podman_runtime] Found 0 reusable workload(s): '[]'
[2023-11-29T13:04:21Z INFO ank_agent::runtime_manager] Found '0' reusable 'podman' workload(s).
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman-kube' workloads on agent 'agent_A'.
[2023-11-29T13:04:21Z INFO ank_agent::runtime_manager] Found '1' reusable 'podman-kube' workload(s).
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_manager] Creating control interface pipes for 'WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config: "manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }'
[2023-11-29T13:04:21Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58"'
[2023-11-29T13:04:21Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58/input"'
[2023-11-29T13:04:21Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58/output"'
[2023-11-29T13:04:21Z INFO ank_agent::runtime_connectors::runtime_facade] Replacing 'podman-kube' workload 'nginx' on agent 'agent_A'
[2023-11-29T13:04:21Z WARN ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Could not read pods from volume: "Execution of command failed: Error: no such volume nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A.pods\n"
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Deleting workload with workload execution instance name 'nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A'
[2023-11-29T13:04:21Z WARN ank_agent::runtime_connectors::runtime_facade] Failed to delete workload when replacing workload 'nginx': 'Could not delete workload 'Execution of command failed: Error: unable to read YAML as Kube Pod: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal number into Go struct field Pod.apiVersion of type string
''
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] The workload 'nginx' has been created with workload execution instance name 'WorkloadExecutionInstanceName { agent_name: "agent_A", workload_name: "nginx", hash: "ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58" }'
[2023-11-29T13:04:21Z DEBUG ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Starting the checker for the workload 'nginx' with workload execution instance name 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:21Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Getting the state for the workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:21Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Received following states for workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A': '[Running, Running]'
[2023-11-29T13:04:21Z DEBUG ank_agent::generic_polling_state_checker] The workload nginx has changed its state to ExecRunning
[2023-11-29T13:04:21Z TRACE grpc::state_change_proxy] Received UpdateWorkloadState from agent
[2023-11-29T13:04:22Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Getting the state for the workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:22Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Received following states for workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A': '[Running, Running]'
[2023-11-29T13:04:23Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Getting the state for the workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:23Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Received following states for workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A': '[Running, Running]'
[2023-11-29T13:04:24Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Getting the state for the workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:24Z TRACE ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Received following states for workload 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A': '[Running, Running]'
podman volume ls after second start:
DRIVER VOLUME NAME
local nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A.config
local nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A.config
local nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A.pods
ank-agent third start:
[2023-11-29T13:04:26Z DEBUG ank_agent] Starting the Ankaios agent with
name: 'agent_A',
server url: 'http://127.0.0.1:25551/',
run directory: '/tmp/ankaios/'
[2023-11-29T13:04:26Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io"'
[2023-11-29T13:04:26Z INFO ank_agent::agent_manager] Starting ...
[2023-11-29T13:04:26Z DEBUG ank_agent::agent_manager] Start listening to server.
[2023-11-29T13:04:26Z DEBUG grpc::client] gRPC Communication Client starts.
[2023-11-29T13:04:26Z TRACE grpc::execution_command_proxy] RESPONSE=ExecutionRequest { execution_request_enum: Some(UpdateWorkload(UpdateWorkload { added_workloads: [AddedWorkload { name: "nginx", runtime: "podman-kube", dependencies: {},
restart: true, update_strategy: AtMostOnce, access_rights: None, tags: [], runtime_config: "manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n
image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }], deleted_workloads: [] })) }
[2023-11-29T13:04:26Z DEBUG ank_agent::agent_manager] Agent 'agent_A' received UpdateWorkload:
Added workloads: [WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config:
"manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }]
Deleted workloads: []
[2023-11-29T13:04:26Z INFO ank_agent::runtime_manager] Received a new desired state with '1' added and '0' deleted workloads.
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_manager] Handling initial workload list.
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman' workloads on agent 'agent_A'.
[2023-11-29T13:04:26Z TRACE ank_agent::runtime_connectors::podman_cli] Listing workload names for: 'agent'='agent_A'
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_connectors::podman::podman_runtime] Found 0 reusable workload(s): '[]'
[2023-11-29T13:04:26Z INFO ank_agent::runtime_manager] Found '0' reusable 'podman' workload(s).
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_connectors::runtime_facade] Searching for reusable 'podman-kube' workloads on agent 'agent_A'.
[2023-11-29T13:04:26Z INFO ank_agent::runtime_manager] Found '2' reusable 'podman-kube' workload(s).
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_manager] Creating control interface pipes for 'WorkloadSpec { agent: "agent_A", name: "nginx", tags: [], dependencies: {}, update_strategy: AtMostOnce, restart: true, access_rights: AccessRights { allow: [], deny: [] }, runtime: "podman-kube", runtime_config: "manifest: |\n apiVersion: v1\n kind: Pod\n metadata:\n name: nginx\n spec:\n restartPolicy: Never\n containers:\n - name: server\n image: docker.io/nginx:latest\n ports:\n - containerPort: 80\n hostPort: 8080\n" }'
[2023-11-29T13:04:26Z TRACE ank_agent::control_interface::directory] Reusing existing directory '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58"'
[2023-11-29T13:04:26Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58/input"'
[2023-11-29T13:04:26Z TRACE ank_agent::control_interface::fifo] Reusing existing fifo file '"/tmp/ankaios/agent_A_io/nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58/output"'
[2023-11-29T13:04:26Z INFO ank_agent::runtime_connectors::runtime_facade] Replacing 'podman-kube' workload 'nginx' on agent 'agent_A'
[2023-11-29T13:04:26Z INFO ank_agent::runtime_connectors::runtime_facade] Deleting 'podman-kube' workload 'nginx' on agent 'agent_A'
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Deleting workload with workload execution instance name 'nginx.ca7b437551978d6c73fc2c629fabe4d9a5e59190af669c009ad5659e6b43ef58.agent_A'
[2023-11-29T13:04:26Z WARN ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Could not read pods from volume: "Execution of command failed: Error: no such volume nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A.pods\n"
[2023-11-29T13:04:26Z DEBUG ank_agent::runtime_connectors::podman_kube::podman_kube_runtime] Deleting workload with workload execution instance name 'nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A'
[2023-11-29T13:04:26Z WARN ank_agent::runtime_connectors::runtime_facade] Failed to delete workload when replacing workload 'nginx': 'Could not delete workload 'Execution of command failed: Error: unable to read YAML as Kube Pod: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal number into Go struct field Pod.apiVersion of type string
''
[2023-11-29T13:04:27Z WARN ank_agent::runtime_connectors::runtime_facade] Failed to create workload when replacing workload 'nginx': 'Could not create workload: 'Execution of command failed: Error: adding pod to state: name "nginx" is in
use: pod already exists
''
podman volume ls after third start:
DRIVER VOLUME NAME
local nginx.986b8d2fac1174412d106c512cd7d27aeb237af2b8e96642405606f92918e589.agent_A.config
Final result
To be filled by the one closing the issue.