tempo
tempo copied to clipboard
Use fully qualifed domain name for storage backends when running on Kubernetes
Problem:
Kubernetes configures ndots=5
under /etc/resolv.conf
options, that means any hostname that has less then 5 dots in it is looked up locally first, and then finally it is resolved.
As per our tests, we have seen 11 local lookups before DNS resolution is done (see https://github.com/grafana/jsonnet-libs/pull/762 for more details). These local lookups add latency, take up space in the conntrack table and add extra load to the kube-dns service.
Tempo makes a tons of API calls to storage backend, and these extra DNS in storage backend calls adds to read and write latency for tempo.
We started looking into it because we noticed DNS Lookup Errors (see https://github.com/grafana/tempo/issues/1462), which we improved with https://github.com/grafana/tempo/pull/1632, and lowered ndots=3 to deal with it.
There are two ways to solve this:
- Use FQDN (hostname tld with dot) in storage backend config, and that will hint kube-dns to skip local search and do a direct lookup
- lower ndots but it can break kubernetes service discovery [we can't do that]
solution 2 is not ideal so we are only left with solution 1.
This is NOT an issue if you are running tempo on VMs, In most Linux distros ndots is not configured, and default value of 1 is used. if cat /etc/resolv.conf | grep ndots
shows nothing, that means it's set to default.
See more:
- https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html
- https://github.com/grafana/jsonnet-libs/pull/762
Solution:
We start using FQDN in storage config, and suggest our users to use FQDN when running tempo on kubernetes.
Azure Blob Storage: If we just add dot at the end of TLD, we get 400 (see more https://github.com/grafana/tempo/issues/1462#issuecomment-1224082681), that happens because Azure doesn't allow trailing dot in Host
header. We need to trim dot from Host header in Azure Client
Google Cloud Storage: with trailing dot in Host
header, we get 301 with location:
set without trailing dot in url.
AWS S3: trailing dot is not allowed in Host
header, we get 404 The specified bucket does not exist
error back from S3
Minio: not tested yet
Is the idea behind this to set all other configs aside from Azure to have a dot at the end of their base storage URLs? In other words, Azure will remain with a lot of DNS hits? Since looking at the Azure golang library one cannot set arbitrary request headers, and as such it doesn't seem possible using that library to get around the problem that they will not permit dots at the end of the container URL
I tested other providors, and looks like we will need to do what curl does with trailing dot. curl trims traling dot from Host header but keeps it in URL, see https://github.com/curl/curl/commit/3a6142865fec3c54cd7081ada86c93c135e4b32f for details.
We can put trailing dot in config url, but trim it from Host
header by using a custom transport in the client.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.