pyroscope icon indicating copy to clipboard operation
pyroscope copied to clipboard

Feature: Allow to customise the source code integration

Open simonswine opened this issue 4 months ago • 2 comments

Is your feature request related to a problem? Please describe.

Our current implementation of source integration only support Go source code that is directly in the git repo or can be retrieved using go.mod mechnisms.

There is a huge matrix how source code gets assembled into or installed on the destination system, it might become very tricky to fully support all to modes for different languages/operating-system/build tools.

Describe the solution you'd like

It should be possible to configure a separate well defined external API, that would take care of all the intricacies and:

  • Expose the right version of source code for a particular binary,
  • The file path is exactly the same as it is seen on the profiling data

Proposal A: Allow to configure a custom URL for vcs.v1.GetFile / vcs.v1.GetCommit(s)

  • This is the API we use today for retrieving file information, see proto-definition
  • It works by passing service_repository, service_git_ref, [service_root_path] from labels on the profile.
Example usage
# Get commit hash 
curl 'http://service/vcs.v1.VCSService/GetCommit' \
  -H 'content-type: application/json' \
  --data-raw '{"repositoryURL":"https://github.com/grafana/pyroscope","ref":"HEAD"}'


# Get source code file
curl 'http://service/vcs.v1.VCSService/GetFile' \
    -H 'content-type: application/json' \
  --data-raw '{"repositoryURL":"https://github.com/grafana/pyroscope","ref":"HEAD","localPath":"/home/user/pyroscope/cmd/pyroscope/main.go","rootPath":""}'
{
"URL": "[URL to GitHub view]",
"content": "[returns the full source file as base64]"
}

Proposal B: Use debuginfod API for fetching source code

  • This is well defined API, used by tools like gdb to fetch source and debuginfo to symbolize binaries.
  • More on this https://wiki.archlinux.org/title/Debuginfod

We would need to make sure all profiles are propagating their build IDs in the mapping object. (e.g. golang doesn't do that in all cases). Theoretically we could also encaspulate the current information into a string and deliver through build IDs (service_repository, service_git_ref, [service_root_path]).

The server needs to be able to match the source code only by having the build ID, which is trickier that when the commit hash source code is known, as the build ID is only known at build time (e.g. in CI).

Example usage
# Get build id of python
$ file /usr/bin/python3.13
/usr/bin/python3.13: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=387f543579fb53103028f77de068e1e025ec148d, for GNU/Linux 3.7.0, stripped

# Request source code using path and build ID
$ curl https://debuginfod.fedoraproject.org/buildid/387f543579fb53103028f77de068e1e025ec148d/source/usr/src/debug/python3.13-3.13.5-1.fc42.aarch64/Programs/python.c
[returns the full file]

Additional context

Both of those API should be configured in the Grafana instance (with support for using basic and bearer token auth)

It might make sense to query multiple services per source code request, to support different source code retrieval forms for different languages/areas of the business.

simonswine avatar Aug 01 '25 10:08 simonswine

Does operating debuginfod require storing binary+source (and all dependencies?) for each build id?

korniltsev avatar Aug 04 '25 05:08 korniltsev

Hi! Just arrived here after trying to make Profiles Drilldown in Grafana work with Python on our env.

On Python, and I suspect with other interpreted languages too, the existing source code resolution almost works - the path to the source code is sent properly to Pyroscope by the eBPF profiler (I'm testing with the eBPF one bundled on the latest Alloy nightlies), but the only thing missing (apart from the frontend only enabling the GitHub button for *.go files) seems to be the path prefix lookup (i.e know that /app/src/app.py in the Docker container matches /src/app.py in the repo root), which is currently only done for Go code:

https://github.com/grafana/pyroscope/blob/9bc2847e6b2e24a09f85b3511023fc1e90269385/pkg/frontend/vcs/source/find_go.go#L102-L111

https://github.com/grafana/pyroscope/blob/9bc2847e6b2e24a09f85b3511023fc1e90269385/pkg/frontend/vcs/source/find.go#L50-L63

I wonder if a first step without needing to implement a fully pluggable mechanism could be to make this segment removal work regardless of the file extension? Or at least adding some other common interpreted language extensions that may be resolved in a similar fashion (.py at least, maybe .js, .rb, .pl?). Happy to contribute a PR for it.

zenitraM avatar Sep 17 '25 13:09 zenitraM