[Draft] Resolution of symbols by the server
The main features in this pull request is to make the server be able to locally resolve the symbol and serialize them in trace file. For this the client sends his image data to the server.
Why this can be a improvement
- The client has less work to do.
- You can resolve symbols in the profiler even if the client machine doesn't have the files containing debug information
- Less bandwidth used for communication between client and server over time (except at the begin of the recording).
- Working on unification of the symbols resolution code.
- By disabling automatic symbol resolution entirely, this can speed up the capture time and size.
Implementation details
- In the global settings you can specify symbol resolution behaviour (Attempt profiler resolution by profiler, Prevent resolution by application ).
-
Tracy_callstack is now used by Tracy server (in the
commonfolder). - Tracy_callstack can use debug GUIDs sent by the client to locate and load symbol information.
- Only working for Windows right now
- Designed to work for Linux/mac later on
- On linux, we already store the GNU build id, so this will be forward compatible
- Client caches its image information and sends it to the server after the handshake. If new images are loaded later on, a symbol query will be issued (as before) leading to discovery of the module and its debug id will be sent to the server.
Areas of improvements/TODO list
- Make the resolution of symbols on the server asynchronous (similarly to the symbol worker of the client).
- Attempt to load debug information using a user-provided path.
- Provide a way for the user to specify the symbols location from the UI (we can with this make the server totally independent of the client for image caching).
- ELF/Macho-o resolution by modifying libbacktrace to support resolution of another process. (Apparently not that many modifications required)
- Rework the content of Tracy_callstack.cpp to provide an API without global state (ie: some "context"/"state" that we could store in the Worker instead of relying on a global cache)
- Offline symbol resolution (the
updateprogram) should use the same mechanism.- For now it only uses the image path, never the GUID. This means that you could in theory end up using incorrect symbols (if program was recompiled, OS kernel modules updated, etc).
- It uses addr2line which is slow and should only serve as a fallback mechanism, after having confirmed the path to the executable matches the one from the capture
The main idea for this feature is to make Tracy more flexible, especially when you can't or don't want the application make the symbol resolution job.
How does the server-side symbol resolution work? Let's say I'm running the server on Linux and the client is a Windows application.
For now it will simply fallback to querying the client for the symbols. We rely on the debug guid (type is known and sent to the server) to ensure we are using the correct symbol file (and also possibly locate it, using symbol servers).
The idea for the first step is to only allow to resolve symbols on the server when the client has a similar executable format (ELF, Macho, PE with pdb). Which means mostly windows ciuld resolve windows, Linux can resolve Linux+Android+... And Macos could resolve macos/ios.
In the future however, I'd like to make it possible to resolve GNU binaries (first elf, macho, perhaps later PE files with mingw) using a windows server too. (The other way is harder since it would require something to replace DbgHelp) This seems feasible by modifying libbacktrace (or using another library if you're up to it).
I also just saw that the CI build is failing, seemingly due to the enum using an underlying type. (weirdly we didn't get this while building on our private CI, probably missed some executable that does not enable c++11)
I'll fix this tomorrow at work, but would like to check with you what the minimal supported version of C++ we should be using (ie: should we not use such enums or should we fix the projects to target c++11?)
Edit: actually after reading the error log again, it could just be that were missing the <stdint.h> include
Is there a plan to support caching of source files from symbols that contain source-control information?
In the latest version of Tracy, I see source files cached from symbols that I've generated locally, but not from those that come from a symbol store. So, I have to rely on source location substitutions for these files, which may not always be an accurate representation of the source control version.
P.S. - Thank you for your amazing work on tracy! I can't wait to try the changes from this PR!
In the latest version of Tracy, I see source files cached from symbols that I've generated locally, but not from those that come from a symbol store. So, I have to rely on source location substitutions for these files, which may not always be an accurate representation of the source control version.
Yes, the plan is to be able to offload as much work as possible for offline resolution (as in, we would still keep current behavior for those who need it, but also allow further offline processing. And for those who want full offline resolution, they could). But I'd like to finish the symbols part before tackling source/binaries.
Once we have the debug symbols information we can then query not only symbols but also source files or even binaries themselves. This would also allow for optimisations : look for files on the server (profiler gui) side locally (and decide whether they are the correct version), then fallback to symbol servers (with their cache).
@wolfpld would you like us to split this PR into smaller chunks? It could me reviewing easier
It would probably be helpful.
I'm turning this PR into a draft as I will be submitting smaller PRs that split the review work