linkerd2 [FR] - reduce endpoint added/removed logs to debug

What problem are you trying to solve?

I'm finding the following logs to be incredibly verbose (we have deployments with >100 pods for example constantly rotating with HPA) and masking other more interesting logs. These may be better suited to debug?

INFO ThreadId(02) outbound:proxy{addr=10.224.15.134:5000}:service{ns=<> name=<> port=<>}: linkerd_pool_p2c: Removing endpoint
INFO ThreadId(02) outbound:proxy{addr=10.224.15.134:5000}:service{ns=<> name=<> port=<>}: linkerd_pool_p2c: Adding endpoint addr=<>:5000

How should the problem be solved?

Move logs to debug level from info.

Any alternatives you've considered?

n/a

How would users interact with this feature?

No response

Would you like to work on this feature?

None

Jun 11 '24 08:06 dwilliams782

I took an extraction of a single pod's worth of adding / removing logs to a single service, e.g.:

then passed it through a noddy little .py script to build up a collection of the IP addresses:

import json
import re
from datetime import datetime

addr_list = []

with open('<logs-file.json>') as data_file:
    data = json.load(data_file)
    for v in reversed(data):
        log_search = re.search('(Adding|Removing) endpoint addr=(\d+\.\d+\.\d+\.\d+)',v["line"])
        timestamp = datetime.utcfromtimestamp((int(v["timestamp"]) / 1_000_000_000)).strftime('%Y-%m-%d %H:%M:%S')
        action = log_search.group(1)
        addr = log_search.group(2)
        print(v["line"])
        if action == "Adding" and addr not in addr_list:
            print("Adding {0}".format(addr))
            addr_list.append(addr)
        elif action == "Removing":
            print("Removing {0}".format(addr))
            addr_list.remove(addr)
        print("{0}: {1}: {2}".format(timestamp, len(addr_list), addr_list))

and I noticed two things:

There are a lot of repeat logs of adding the same IP addresses in. I see this reflected in outbound_http_balancer_endpoints metric as the number sometimes goes to 0 - why?
By the end of the script, we ended up with about 3x the amount of IPs that actually existed for that service, and the number never aligns with the value of outbound_http_balancer_endpoints. I'll validate this again today but from my initial testing, these logs aren't all that accurate?

Jun 12 '24 09:06 dwilliams782

What version are you on? The outbound_http_balancer_endpoints metrics was broken and got fixed in edge-24.5.1 (Ref linkerd/linkerd2-proxy#2928). Granted that doesn't explain the repeated IP entries in the logs; I'll discuss this with the team. One possible stop-gap for appeasing the logs is adding linkerd_pool_p2c=error to the proxy log setting.

Jun 12 '24 22:06 alpeb

Thanks Alejandro - we're on 2.15.3-enterprise, I don't know whether that fix is in there or not.

I was unable to reproduce the erroneous logs starting fresh in a clean environment, so I think it is actually related to turning HAZL off and on. I'll submit a proper Buoyant support ticket when I have good repro steps on that one.

Jun 13 '24 08:06 dwilliams782

Ok the metrics fix is already included in the version you're on. Looking forward to the support ticket with the repro steps :-)

Jun 13 '24 14:06 alpeb

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Sep 14 '24 01:09 stale[bot]

Hi! I feel the same. Is there a way to change the log level of "linkerd_pool_p2c"? @alpeb

Feb 05 '25 09:02 hadican

Hi! I feel the same. Is there a way to change the log level of "linkerd_pool_p2c"? @alpeb

You can change the proxy's default log level to this:

proxy.logLevel: warn,linkerd=info,hickory=error,linkerd_pool_p2c=error

Feb 12 '25 20:02 alpeb