pixie Inconsistent pod ID returned by `px.ip_to_pod

Describe the bug The px.ip_to_pod_id function in Pixie occasionally returns an incorrect pod ID for a given remote address. The issue occurs when Pixie incorrectly matches the IP address to a random cron job running within the cluster, instead of a long-running pod that should be matched. The problem happens randomly and is not specific to a certain pod type. The pods affected by this issue have a typical lifecycle of a few hours. We suspect that this issue may be caused by a bug in the metadata server.

To Reproduce Steps to reproduce the behavior:

Deploy few cronjobs that does an HTTP call to an random endpoint which runs every minute.
Deploy few pods which talks to each other in HTTP.
Get the IP address of those pods from kubectl.
Check the output of px.pod_id_to_pod_name(px.ip_to_pod_id(POD_IP_ADDRESS)) for each pod after a while.

Note - This is not a occasionally behaviour and hard to reproduce at once.

Expected behavior px.ip_to_pod_id should return the correct IP address.

App information (please complete the following information):

Pixie version - 0.12.10
K8s cluster version - v1.23.12
Node Kernel version - 5.4.0-1091-azure
Browser version

Additional context Please refer this slack thread for additional context - https://pixie-community.slack.com/archives/C0405TUGY2C/p1676378960370879

Feb 16 '23 11:02 isala404

Exact steps to reproduce

Deploy following two kubernetes manifest files. cronjobs.yaml services.yaml

Create a curl pod that query the services and let it run for about an hour.

kubectl run curl-pod --restart=Never --image=curlimages/curl -- /bin/sh -c 'while true; do curl -o /dev/null -s -w "status-code: %{http_code} duration: %{time_total}\n" http://start-component-3313962498.pixie-test-namespace:9090/requestdatachain | xargs -I {} sh -c "echo time: \$(date +\"%Y-%m-%dT%H:%M\") {}"; sleep 300; done'

Open px/http_data script,
- set source_filter as cron-test
- set destination_filter as pixie-test-namespace
- set start_time as -1h
You will see records as following even though the pods in cron-test namespace doesn't send any http calls to pixie-test-namespace namespace.

Screenshot 2023-02-23 at 16 43 12

Feb 23 '23 11:02 isala404

Thanks to some additional logs/info that @MrSupiri sent over, we figured out the issue here. Currently, each PEM keeps a map from IP -> Pod ID, which is used in the ip_to_pod_id UDF. The map represents the IP mappings of the cluster at the current moment.

What happened here was:

curl-pod has IP X and makes requests to pixie-test-namespace. PEM correctly collects this as a request from IP X.
curl-pod is deleted and reassigned IP Y.
cron-test pod is created and assigned IP X. The likelihood of some pod being assigned IP X is higher with cronjobs, because new pods are created so frequently.
IP map in PEM is updated: IP X -> cron-test pod.
User executes query, and maps cron-test to remote_addr X.

One possible solution here is to make the IP map also track timestamp, ip_to_pod_id would also need a timestamp arg so it can select the correct pod_id for the time the HTTP request was made [if no timestamp is provided, we assume the current time, for backwards compatibility].

Feb 24 '23 21:02 aimichelle

Thank you for the detailed explanation and for identifying the root cause of the issue. It's great to see that the team was able to figure out a solution to address the problem.

Please let me know if there is any further action required on my part, or if there are any additional details that would be helpful in resolving this issue.

Feb 25 '23 06:02 isala404

I explored various ways in which this issue could be addressed and the ideas are documented in this doc: https://docs.google.com/document/d/1QIPUleSJhvj7kSu0QwFI1DjUGCfFdbFk_LbuNSS4cVM/edit#

A design review will be conducted to evaluate the various options on the table.

Jun 01 '23 17:06 vihangm

@MrSupiri we have added a UDF that will allow one to use the timestamp of the trace to identify which pod what related to a given IP at that time. Our policy is to wait for 1 month after the vizier release before using this new UDF in our scripts, so the builtin scripts won't be updated for a while but if you so wish, you can start using this new UDF once the next vizier release is out.

Instead of using px.ip_to_pod_id(df.remote_addr), you can switch to px.ip_to_pod_id(df.remote_addr, df.time_) (assuming time_ is the time of the trace) and this should resolve the correct pod.

Jun 13 '23 01:06 vihangm

Ah right, thanks for the update @vihangm!

Jun 13 '23 08:06 isala404

Thanks for this fix. Would there be a similar problem with service IP re-use, if services are coming and going? (less common of a scenario than pod churn, of course).

I'm wondering if it would be possible to extend ip_to_service_id() to accept an optional second time argument.

Sep 06 '23 21:09 rosenhouse

pixie
pixie copied to clipboard

Inconsistent pod ID returned by `px.ip_to_pod_id` function

Exact steps to reproduce

pixie pixie copied to clipboard

Inconsistent pod ID returned by `px.ip_to_pod_id` function

Exact steps to reproduce

pixie
pixie copied to clipboard