[FR] - reduce endpoint added/removed logs to debug
What problem are you trying to solve?
I'm finding the following logs to be incredibly verbose (we have deployments with >100 pods for example constantly rotating with HPA) and masking other more interesting logs. These may be better suited to debug?
INFO ThreadId(02) outbound:proxy{addr=10.224.15.134:5000}:service{ns=<> name=<> port=<>}: linkerd_pool_p2c: Removing endpoint
INFO ThreadId(02) outbound:proxy{addr=10.224.15.134:5000}:service{ns=<> name=<> port=<>}: linkerd_pool_p2c: Adding endpoint addr=<>:5000
How should the problem be solved?
Move logs to debug level from info.
Any alternatives you've considered?
n/a
How would users interact with this feature?
No response
Would you like to work on this feature?
None
I took an extraction of a single pod's worth of adding / removing logs to a single service, e.g.:
then passed it through a noddy little .py script to build up a collection of the IP addresses:
import json
import re
from datetime import datetime
addr_list = []
with open('<logs-file.json>') as data_file:
data = json.load(data_file)
for v in reversed(data):
log_search = re.search('(Adding|Removing) endpoint addr=(\d+\.\d+\.\d+\.\d+)',v["line"])
timestamp = datetime.utcfromtimestamp((int(v["timestamp"]) / 1_000_000_000)).strftime('%Y-%m-%d %H:%M:%S')
action = log_search.group(1)
addr = log_search.group(2)
print(v["line"])
if action == "Adding" and addr not in addr_list:
print("Adding {0}".format(addr))
addr_list.append(addr)
elif action == "Removing":
print("Removing {0}".format(addr))
addr_list.remove(addr)
print("{0}: {1}: {2}".format(timestamp, len(addr_list), addr_list))
and I noticed two things:
-
There are a lot of repeat logs of adding the same IP addresses in. I see this reflected in
outbound_http_balancer_endpointsmetric as the number sometimes goes to 0 - why? -
By the end of the script, we ended up with about 3x the amount of IPs that actually existed for that service, and the number never aligns with the value of
outbound_http_balancer_endpoints. I'll validate this again today but from my initial testing, these logs aren't all that accurate?
What version are you on? The outbound_http_balancer_endpoints metrics was broken and got fixed in edge-24.5.1 (Ref linkerd/linkerd2-proxy#2928). Granted that doesn't explain the repeated IP entries in the logs; I'll discuss this with the team.
One possible stop-gap for appeasing the logs is adding linkerd_pool_p2c=error to the proxy log setting.
Thanks Alejandro - we're on 2.15.3-enterprise, I don't know whether that fix is in there or not.
I was unable to reproduce the erroneous logs starting fresh in a clean environment, so I think it is actually related to turning HAZL off and on. I'll submit a proper Buoyant support ticket when I have good repro steps on that one.
Ok the metrics fix is already included in the version you're on. Looking forward to the support ticket with the repro steps :-)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Hi! I feel the same. Is there a way to change the log level of "linkerd_pool_p2c"? @alpeb
Hi! I feel the same. Is there a way to change the log level of "linkerd_pool_p2c"? @alpeb
You can change the proxy's default log level to this:
proxy.logLevel: warn,linkerd=info,hickory=error,linkerd_pool_p2c=error