pixie
pixie copied to clipboard
Inconsistent pod ID returned by `px.ip_to_pod_id` function
Describe the bug
The px.ip_to_pod_id
function in Pixie occasionally returns an incorrect pod ID for a given remote address. The issue occurs when Pixie incorrectly matches the IP address to a random cron job running within the cluster, instead of a long-running pod that should be matched. The problem happens randomly and is not specific to a certain pod type. The pods affected by this issue have a typical lifecycle of a few hours. We suspect that this issue may be caused by a bug in the metadata server.
To Reproduce Steps to reproduce the behavior:
- Deploy few cronjobs that does an HTTP call to an random endpoint which runs every minute.
- Deploy few pods which talks to each other in HTTP.
- Get the IP address of those pods from kubectl.
- Check the output of
px.pod_id_to_pod_name(px.ip_to_pod_id(POD_IP_ADDRESS))
for each pod after a while.
Note - This is not a occasionally behaviour and hard to reproduce at once.
Expected behavior
px.ip_to_pod_id
should return the correct IP address.
App information (please complete the following information):
- Pixie version - 0.12.10
- K8s cluster version - v1.23.12
- Node Kernel version - 5.4.0-1091-azure
- Browser version
Additional context Please refer this slack thread for additional context - https://pixie-community.slack.com/archives/C0405TUGY2C/p1676378960370879
Exact steps to reproduce
-
Deploy following two kubernetes manifest files. cronjobs.yaml services.yaml
-
Create a curl pod that query the services and let it run for about an hour.
kubectl run curl-pod --restart=Never --image=curlimages/curl -- /bin/sh -c 'while true; do curl -o /dev/null -s -w "status-code: %{http_code} duration: %{time_total}\n" http://start-component-3313962498.pixie-test-namespace:9090/requestdatachain | xargs -I {} sh -c "echo time: \$(date +\"%Y-%m-%dT%H:%M\") {}"; sleep 300; done'
-
Open
px/http_data
script,- set
source_filter
ascron-test
- set
destination_filter
aspixie-test-namespace
- set
start_time
as-1h
- set
-
You will see records as following even though the pods in
cron-test
namespace doesn't send any http calls topixie-test-namespace
namespace.
Thanks to some additional logs/info that @MrSupiri sent over, we figured out the issue here.
Currently, each PEM keeps a map from IP -> Pod ID, which is used in the ip_to_pod_id
UDF. The map represents the IP mappings of the cluster at the current moment.
What happened here was:
-
curl-pod
has IPX
and makes requests topixie-test-namespace
. PEM correctly collects this as a request from IPX
. -
curl-pod
is deleted and reassigned IPY
. -
cron-test
pod is created and assigned IPX
. The likelihood of some pod being assigned IPX
is higher with cronjobs, because new pods are created so frequently. - IP map in PEM is updated: IP
X
->cron-test
pod. - User executes query, and maps
cron-test
to remote_addrX
.
One possible solution here is to make the IP map also track timestamp, ip_to_pod_id
would also need a timestamp arg so it can select the correct pod_id for the time the HTTP request was made [if no timestamp is provided, we assume the current time, for backwards compatibility].
Thank you for the detailed explanation and for identifying the root cause of the issue. It's great to see that the team was able to figure out a solution to address the problem.
Please let me know if there is any further action required on my part, or if there are any additional details that would be helpful in resolving this issue.
I explored various ways in which this issue could be addressed and the ideas are documented in this doc: https://docs.google.com/document/d/1QIPUleSJhvj7kSu0QwFI1DjUGCfFdbFk_LbuNSS4cVM/edit#
A design review will be conducted to evaluate the various options on the table.
@MrSupiri we have added a UDF that will allow one to use the timestamp of the trace to identify which pod what related to a given IP at that time. Our policy is to wait for 1 month after the vizier release before using this new UDF in our scripts, so the builtin scripts won't be updated for a while but if you so wish, you can start using this new UDF once the next vizier release is out.
Instead of using px.ip_to_pod_id(df.remote_addr)
, you can switch to px.ip_to_pod_id(df.remote_addr, df.time_)
(assuming time_
is the time of the trace) and this should resolve the correct pod.
Ah right, thanks for the update @vihangm!
Thanks for this fix. Would there be a similar problem with service IP re-use, if services are coming and going? (less common of a scenario than pod churn, of course).
I'm wondering if it would be possible to extend ip_to_service_id()
to accept an optional second time argument.