Error in Container Creation on Running Benchmark
I encounter the following issue when trying to deploy the benchmark chained-function-serving and also aes (these are the only 2 benchmarks I have attempted to deploy). As a representative example, I will include the details pertaining to aes here.
Steps to reproduce the issue
I used the vSwarm-u profile on CloudLab to reproduce the issue on. Set up a single-node cluster as shown in the vHive quick-start guide and pull all the required images
git clone --depth=1 https://github.com/ease-lab/vhive.git
cd vhive && mkdir /tmp/vhive-logs
./scripts/cloudlab/setup_node.sh;
sudo screen -dmS containerd containerd; sleep 5;
sudo PATH=$PATH screen -dmS firecracker /usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml; sleep 5;
source /etc/profile && go build;
sudo screen -dmS vhive ./vhive; sleep 5;
./scripts/cluster/create_one_node_cluster.sh
cd ..
sudo apt install docker.io
git clone --depth=1 https://github.com/ease-lab/vSwarm.git
cd vSwarm/benchmarks/aes
sudo make pull
Ensure that kubectl get pods -A shows all pods' status as Running or Completed (if not, wait till that happens).
Now deploy a function. As a representative, we shall attempt to deploy kn-aes-go.
kubectl apply -f ./yamls/knative/kn-aes-go.yaml
which gives the output
service.serving.knative.dev/aes-go created
The Error
A CreateContainerError is encountered in one of the containers.
kubectl get pods -A
outputs (first line shown here)
NAMESPACE NAME READY STATUS RESTARTS AGE default aes-go-00001-deployment-86749dbdf9-qcrzc 2/3 CreateContainerError 0 15s
For the 3 containers user-container-0, user-container-1 and queue-proxy, here are the outputs of kubectl logs aes-go-00001-deployment-86749dbdf9-qcrzc -c ${CONTAINER_NAME} in that order:
time="2022-08-29T14:48:46Z" level=info msg="Started relay server at 0.0.0.0:50000"
time="2022-08-29T14:48:46Z" level=info msg="Start AES-go server. Addr: 0.0.0.0:50051\n"
Error from server (BadRequest): container "queue-proxy" in pod "aes-go-00001-deployment-86749dbdf9-qcrzc" is waiting to start: CreateContainerError
Therefore it is the queue-proxy container that generates the error.
Take a look at the list of events (a part of the output of kubectl describe pod aes-go-00001-deployment-86749dbdf9-qcrzc)
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m57s default-scheduler Successfully assigned default/aes-go-00001-deployment-86749dbdf9-qcrzc to node-0.opt.rperf-pg0.utah.cloudlab.us Normal Pulling 3m57s kubelet Pulling image "docker.io/vhiveease/relay:latest" Normal Pulled 3m54s kubelet Successfully pulled image "docker.io/vhiveease/relay:latest" in 2.795450552s Normal Created 3m54s kubelet Created container user-container-0 Normal Started 3m54s kubelet Started container user-container-0 Normal Pulling 3m54s kubelet Pulling image "docker.io/vhiveease/aes-go:latest" Normal Pulled 3m52s kubelet Successfully pulled image "docker.io/vhiveease/aes-go:latest" in 1.87558108s Normal Created 3m52s kubelet Created container user-container-1 Normal Started 3m52s kubelet Started container user-container-1 Normal Pulling 3m52s kubelet Pulling image "docker.io/vhiveease/queue-39be6f1d08a095bd076a71d288d295b6@sha256:7664e43ef34eccf3c311a0a7fa75da472303faf387e3f5f0a5fb863a9dbc3aff" Normal Pulled 3m49s kubelet Successfully pulled image "docker.io/vhiveease/queue-39be6f1d08a095bd076a71d288d295b6@sha256:7664e43ef34eccf3c311a0a7fa75da472303faf387e3f5f0a5fb863a9dbc3aff" in 3.35848303s Warning Failed 2m32s (x8 over 3m49s) kubelet Error: VM config for pod does not exist Normal Pulled 2m32s (x7 over 3m48s) kubelet Container image "docker.io/vhiveease/queue-39be6f1d08a095bd076a71d288d295b6@sha256:7664e43ef34eccf3c311a0a7fa75da472303faf387e3f5f0a5fb863a9dbc3aff" already present on machine
Note the second last line which says VM config for pod does not exist. I saw the same mentioned on the vhive screen.
ERRO[2022-08-29T08:58:46.612409403-06:00] VM config for pod ef0e43e71a5343cca51f0bdfa0823db4e521c5f50d20b243255c6dc4c3971bce does not exist ERRO[2022-08-29T08:58:46.612459170-06:00] error="VM config for pod does not exist"
kn service list gives the output
NAME URL LATEST AGE CONDITIONS READY REASON aes-go http://aes-go.default.192.168.1.240.sslip.io 9m44s 0 OK / 3 Unknown RevisionMissing : Configuration "aes-go" is waiting for a Revision to become ready.
Logs
kubectl describe pod aes-go-00001-deployment-fdd5c869b-dx6sz : kubectl-decribe-pod.log
kubectl get service : kubectl-get-service.log
kubectl get pods -A : kubectl-get-pods.log
vSwarm functions' YAML files need to be modified in the following format to use firecracker MicroVMs instead of containers or gVisor VMs: https://github.com/ease-lab/vhive/blob/main/configs/knative_workloads/helloworld.yaml
See vHive Issue 68 (link is above).
@alannair could you add a note for this peculiarity to vSwarm's main README in a PR?
The yaml (eg. kn-aes-go.yaml) files contain args such as addr and function-endpoint-url which are passed to the image. If we are to specify the image name and port env variables inside the stub image (as suggested), then how do we pass the args?
In addition, please clarify the following: The workaround involves running the function image within an external container that is configured to work with containerd. This external container is set up such that it initializes the image/port as per the respective environment variables. Is this correct?
@alannair the stub image does nothing although it runs in the same pod. The sole purpose of the stub container is to serve heartbeats coming from knative & k8s. Ultimately, we should make sure the target container serves those messages on its own but for that we need to investigate the problem further.
I think the env variables are set up for all containers. The arguments are just runtime arguments supplied to the command to run inside the target container.
@ustiugov I am able to deploy the functions by using the modified yaml format. Here is the modified aes-python manifest.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: aes-python
namespace: default
spec:
template:
spec:
containers:
- image: crccheck/hello-world:latest # Stub image. See https://github.com/ease-lab/vhive/issues/68
ports:
- name: h2c # For GRPC support
containerPort: 50051
env:
- name: GUEST_PORT # Port on which the firecracker-containerd container is accepting requests
value: "50051"
- name: GUEST_IMAGE # Container image to use for firecracker-containerd container
value: "docker.io/vhiveease/aes-python:latest"
As you can see, I have skipped the args parameters which were passed to the target container in the original manifest.
Problem is, while I am able to deploy the function successfully, invocation fails.
Here is the output of ./invoker -port 80 -dbg -time 1 -rps 1 :
DEBU[2022-08-31T14:45:40.870005892-06:00] Debug logging is enabled
INFO[2022-08-31T14:45:40.870107586-06:00] Reading the endpoints from the file: endpoints.json
DEBU[2022-08-31T14:45:40.870262284-06:00] Invoking: aes-python.default.192.168.1.240.sslip.io:80
WARN[2022-08-31T14:45:40.891286725-06:00] Failed to invoke aes-python.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unimplemented desc = Method not found!
DEBU[2022-08-31T14:45:40.891392746-06:00] Invoked aes-python.default.192.168.1.240.sslip.io in 21130 usec
INFO[2022-08-31T14:45:41.871316037-06:00] Issued / completed requests: 1, 0
INFO[2022-08-31T14:45:41.871380829-06:00] Real / target RPS: 0.00 / 1
INFO[2022-08-31T14:45:41.871401948-06:00] Experiment finished!
INFO[2022-08-31T14:45:41.871419873-06:00] The measured latencies are saved in rps0.00_lat.csv
I am speculating here, but this is probably because I did not pass the args to the target container (right?).
But the new manifest does not instantiate the target container. It just instantiates the stub.
- How then, does one pass args to the target container?
- How exactly is the target container even instantiated from the environment variables (GUEST_IMAGE)?
Passing args to the stub container is futile.
Is there a workaround for this now?
@jingren1021 can you please specify for what exactly? If you refer to using vSwarm with Firecracker, then the YAML format changes are described above.
Sorry for the late response.