kuma
kuma copied to clipboard
Add ability to wait for sidecar container
Summary
When deploying Kong on top of EKS into a kuma-based namespace, we noticed that the db migration job that is executed before the pods are started, is sometimes stuck with the following error -
Error: [PostgreSQL error] failed to retrieve server_version_num: host or service not provided, or not known
After trying a few things (changing the db address, upgrading docker version) we noticed that the envoy proxy sometimes is started after the application container (in this case, the migrations container) which I guess causes network errors.
Once we deployed the chart in a namespaces that wasn't managed by Kuma, everything worked fine. Is there a way to tell kuma to first start the envoy proxy, and only then the application itself ?
Thanks
Kuma Chart version - 0.6.0 EKS - 1.19 Kong - 2.1.4
Additional Details & Logs
link to a related ticket in kong - https://github.com/Kong/kong/issues/4363
Did anyone encounter this kind of issue ?
xref https://github.com/kubernetes/kubernetes/issues/65502
The problem is, unless kuma DP is up and running the pod has no network, and as per K8s it appears that sidecar lifecycle is more complicated than it was thought and it is a waiting game.
On the other hand, if you can wrap the main application command or entrypoint, you can use this logic (install netcat in ubuntu or debian, alpine has nc command by default installed)
## Check Network when Service Mesh is enabled
while true
do
nc -vz www.google.com 443
ret_code=$?
if [ $ret_code -ne 0 ] ; then
echo "Network Not ready"
sleep 3
else
echo "Network Ready"
break
fi
done
echo "starting {{.Chart.Name}} service"
MAIN COMMAND
Btw, if you have vault integration and you have a init container which runs , it will not init , to overcome , just add this annotation
vault.hashicorp.com/agent-init-first: "true"
@skaravad I remember when working with Istio that they managed to solve the issue. I think when deploying Istio you had to add a flag which basically tells the app container to wait for the proxy. Are you familiar with that ? Is there a way to implement this solution in kuma as well ?
xref #2571
@michaelkoro with ISTIO it was a annotation https://github.com/istio/istio/issues/11130
annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
But I don't think there was a closure, I think unless K8s has a way to order the containers scheduling in pod , these are just workarounds.
In case of Kuma, it appears that the issue is with only DNS that starts with DP , though you can disable DNS on the DP and use DNS via CP ( @jpeach please correct me if I'm wrong), I think it was not best practice.
@skaravad I actually noticed now that when deploying kong to a kuma-managed namespace, we are getting the following error from the kuma sidecar container, which fails the kong deployment:
Error: could not read file /var/run/secrets/kubernetes.io/serviceaccount/token: stat /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
Which service account is it looking for ?
Ok, I have my guesses. When injecting kuma, at the beginning there is kuma-init
init container started, which is installing transparent proxying, which is also redirecting all DNS traffic to kuma-dp DNS server (by default), as the server starts with the envoy in kuma-sidecar
container, DNS traffic won't work in the duration between kuma-init will finish and kuma-dp DNS server would start. I'm not sure how to fix this at this point yet, without disabling kuma-dp DNS servers.
@michaelkoro we use service account token as authentication mechanism between kuma-dp and kuma-cp.
actually, we discussed it with @jakubdyszkiewicz and it's not even a DNS thing, as all traffic is redirected then, so kuma-dp has to be fully running
@bartsmykla Yea what I ended up doing to avoid the problem was disabling the kuma injection on the kong pre and post migration jobs, just so it could work properly. Not sure why, but the kong pod itself managed to connect to the DB (meaning network was set up), but the pre migration job (which is the same kong image) couldn’t.
Someone also mentionned: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.
There's some research required here as it might not be straight forward.
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.
And the same problem on pod shutdown. sidecar dies faster/first and the main container loses network connection.
@alt-dima We also started experiencing this issue. From time to time when a pod dies, kuma receives the SIGTERM and closes all connections, which causes many "network error" logs from our application, until the application pod is terminated.
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.
I believe we've fixed the shutdown issue you are mentioning in the coming release of Kuma @jakubdyszkiewicz can confirm
Release 1.7.0 ?
Yes releasing early next week
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.
Seems like we need:
- Make sidecar first in the list of containers
- Add a PostStart hook on the sidecar that waits for the sidecar to be ready (this could be a http call)
- we're always good to make sidecar be the first container (atm it's last and there's no determinism so switching will be fine).
- I don't think calling envoy admin is right, we probably want to have this be a combination with the actual DP process.
@johnharris85 thinks that maybe the order of containers doesn't matter.
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
xref: https://github.com/kumahq/kuma/issues/6082
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.