parca icon indicating copy to clipboard operation
parca copied to clipboard

debuginfod is ignored if an agent uploads a stripped binary

Open bobrik opened this issue 11 months ago • 11 comments

Here's systemd on Debian Trixie showing some unresolved symbols:

Image

We can find the corresponding buildid:

$ find projects/parca/data | grep 42a2d21
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/metadata
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo

It is stripped, so it's not very useful:

$ file projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=42a2d21759b160fe6556b8c801294dcfd5fc6764, stripped

Metadata points at the file being uploaded by an agent:

$ cat projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/metadata
{
  "buildId": "42a2d21759b160fe6556b8c801294dcfd5fc6764",
  "source": "SOURCE_UPLOAD",
  "upload": {
    "id": "52a1a4bd-3717-45a0-bd6c-8a19af7aeb2e",
    "hash": "7ad293fe4e10d873162078e376ec14d3",
    "state": "STATE_UPLOADED",
    "startedAt": "2025-01-20T03:38:30.386740609Z",
    "finishedAt": "2025-01-20T03:38:30.391896602Z"
  },
  "quality": {
    "hasDynsym": true
  }
}

We can consult debuginfod and it will happily fetch us the proper debug info:

$ DEBUGINFOD_URLS=https://debuginfod.elfutils.org/ debuginfod-find debuginfo 42a2d21759b160fe6556b8c801294dcfd5fc6764
/home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo

That is not stripped:

$ file /home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo
/home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=42a2d21759b160fe6556b8c801294dcfd5fc6764, with debug_info, not stripped

It would be good for Parca to check debuginfod servers if debuginfo is present, but incomplete, like in this case.

bobrik avatar Jan 26 '25 04:01 bobrik

Strange! Which version of Parca is this?

What Parca should do is reject the upload in the first place if it can find debuginfos in a debuginfod server.

brancz avatar Jan 28 '25 12:01 brancz

This was the latest main at the time of writing. If you could point me to the code that does the rejecting, I can do some debugging to see what's not clicking.

bobrik avatar Jan 29 '25 03:01 bobrik

I think it might be related to debuginfod not being available. Still, it doesn't seem very productive to upload stripped binaries.

bobrik avatar Feb 02 '25 05:02 bobrik

Specifically, debuginfod.elfutils.org does not like many requests at once:

checking debuginfod.elfutils.org 144aa6681b4d21fa0312fe4055b9c0ba1315254e: false request failed: Get "https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo": read tcp [2601:644:4981:f2e8:eaff:1eff:fed5:f416]:34868->[2600:3c03::f03c:91ff:fe50:73f]:443: read: connection reset by peer

They seem to straight up ban the client IP:

ivan@cube:~$ curl -svo /dev/null https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo
* Host debuginfod.elfutils.org:443 was resolved.
* IPv6: 2600:3c03::f03c:91ff:fe50:73f
* IPv4: 96.126.110.187
*   Trying [2600:3c03::f03c:91ff:fe50:73f]:443...
* connect to 2600:3c03::f03c:91ff:fe50:73f port 443 from 2601:644:4981:f2e8:eaff:1eff:fed5:f416 port 58934 failed: Connection refused
*   Trying 96.126.110.187:443...
* connect to 96.126.110.187 port 443 from 192.168.1.50 port 38460 failed: Connection refused
* Failed to connect to debuginfod.elfutils.org port 443 after 156 ms: Could not connect to server
* closing connection #0

My laptop is on the same /64 prefix and it can reach it just fine. The machine above also recovers after some time, but it gets banned very easily with 32 concurrent requests:

ivan@cube:~$ wrk -t 1 -c 32 -d 5s https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo

@fche, is this sort of behavior expected?

bobrik avatar Feb 02 '25 06:02 bobrik

Several public debuginfod servers apply some throttling self-defense measures against IP addresses that use them too heavily. Many concurrent connections from the same IP address is just such a trigger. I'll nudge up the limits of this server. If you are a heavy user of these services, please consider installing a local caching proxy.

fche avatar Feb 02 '25 16:02 fche

For example, in the last 8 hours, this server has received about 7000 duplicate queries for nonexistent build-ids from the same IP address, just seconds apart. A properly functioning debuginfod client would cache the negative hits, but this one does not.

172.203.153.37 - - [02/Feb/2025:16:46:32 +0000] debuginfod.elfutils.org "GET /buildid/69fd2f79f443d687ee083f218f57a8947836b95d/debuginfo HTTP/1.1" 404 9 "-" "parca.dev/debuginfod-client/0.21.0" (-%) 875us -

fche avatar Feb 02 '25 16:02 fche

@fche a local caching proxy does not help with the initial deluge of requests when debuginfod is requested for everything installed.

Would you consider putting Cloudflare in front (via open-source sponsorships) to absorb the shocks and to do both positive and negative caching? With a proper tiered setup you shouldn't get more than ~one request for anything, no matter how well behaving the clients are.

bobrik avatar Feb 02 '25 17:02 bobrik

How about this: If some other organization wishes to arrange for and oversee a public-interest CDN for debuginfod services, I'd be glad to add it to our published list of servers.

fche avatar Feb 02 '25 20:02 fche

@fche, I started an internal RFC. I'll let you know when it has some progress (over email).

bobrik avatar Feb 05 '25 03:02 bobrik

A couple of clarifying statements since v0.22.0 Parca has had:

  • Negative caching
  • Only requesting debuginfod servers if the build ID is determined to be a GNU build ID

If Cloudflare could help with a cache that would be super helpful either way though especially because there is little we can do about our users not upgrading!

brancz avatar Feb 05 '25 09:02 brancz

Still, it doesn't seem very productive to upload stripped binaries.

The reason this happens at this point is that the agent tries to at least find some symbols to upload. At the point where it does that it has already exhausted all other possibilities and it's the only thing left that might offer non-zero chances at symbolizing.

brancz avatar Feb 05 '25 09:02 brancz