tracy icon indicating copy to clipboard operation
tracy copied to clipboard

Better support for cluster environments

Open alexanderbock opened this issue 1 year ago • 15 comments

Totally understand if this is out of scope and a pretty niche usecase.

In our institution we are having a planetarium that is running the same instance 6 times in a networked environment. In the past I have used Tracy in this environment by starting the GUI 7 times and connecting remotely to all instances manually. It would be really neat to be able to connect to all of the clients from a single GUI and possibly also align the timelines from all of the instances and show the places where one of the instances takes longer to execute a function, for example.

Just to be clear, this would be N instances of the same executable and they should always go through the same function calls and where they disagree is where the interesting stuff happens.

alexanderbock avatar Sep 15 '22 08:09 alexanderbock

I'm in a slightly different situation. I'm running several game services in cluster and each service has different functionality. It would be great if all services(clients) could connect to a single GUI and the timelines are aligned.

GCCFeli avatar Nov 08 '22 12:11 GCCFeli

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

PeterTh avatar Nov 15 '22 13:11 PeterTh

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

Ingesting data from several processes and aligning the times are enough to my case. For now I'm hacking this by a simple proxy which is connected by cluster processes and act as the only client to tracy.

GCCFeli avatar Nov 17 '22 05:11 GCCFeli

Making a proxy that would mux multiple clients would be a preferred solution here. To properly handle thread identifiers, which may be duplicated across different processes, you may use the already existing encoding:

https://github.com/wolfpld/tracy/blob/e1395f5a5370eff8fbf24bf450b6a934e6597197/import-chrome/src/import-chrome.cpp#L143-L183

You can see how this works in https://github.com/wolfpld/tracy/issues/213#issuecomment-841641885.

In 0.9 there were many changes in how the timeline items are handled, which is not really visible to users right now. Each track displayed on the timeline is now an instance of https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineItem.hpp and the management of these items is now well defined in https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineController.hpp, instead of the mess it was before. The takeaway here is that it should be now relatively easy to rearrange the threads, so that threads originating from the same process are next to each other, or to add different colorings to thread backgrounds, etc.

wolfpld avatar Nov 17 '22 14:11 wolfpld

Just wanted to add that at our company are looking into integrating Tracy into our development environment, and we also need to merge Tracy data multiple sources. At least two, but perhaps more, that are either on the same machine, or distributed across multiple machines.

Sounds like we have a very similar problem to everyone else in this thread. A mux is a good idea, especially if we can also use that mux to record a trace for later analysis.

Great work on Tracy!

jamesfmilne avatar Feb 02 '23 13:02 jamesfmilne

I tried my hand at writing a mux, and I thought I might add some color to this conversation based on my experience over the last few days. My application involves remote introspection of a target system comprising of many tracy-instrumented processes running at the same time. It should be much more convenient to run a mux/proxy on the target side, which aggregates streams from all processes into one point, shoving them all onto one unified timeline. The idea being that the mux would then present the aggregated data stream on tcp/w.x.y.z:8085, which would be easy to open-up on a firewall and push over the internet to a the profiling user interface running on some remote host (ie. with ./Tracy-release -a w.x.y.z -p 8085. As a side note, I had hoped to make it even easier and avoid a remote-side binary, and in stead offer a web-based (wasm) profiler, with the web server running on the target (I'm OK with it stealing resources). However, I haven't managed to get Emscripten to compile libcapstone into a sysroot, where it will successfully link against the wasm code (a guide on that would be greatly appreciated to help with development). So I'm sticking with the legacy/X11-based unix version of the profiler, becuase the wayland version doesn's work on Ubuntu 22.04 with NVidia 525.85.12.

Towards writing this mux, I was able to fairly easily scan for the UDP broadcast packets sent out on port 8086 by tracy clients. Decoding them was fairly straightforward, and I was able to extract the TCP listenPort, which all tracy-instrumented processes negotiate to be unique on start-up (it looks like the first one gets 8086, the next one gets 8087, etc up to a hard-coded 20 max). This is where things fell apart. I had intended to spin up a thread to start a worker to bind to all TCP streams, collect and forward. However, I can't seem to work out how the handshake / lz4-encoding works for the TCP stream, and how the on-demand and regular implementations of the TCP protocols differ from each other! I can probably work it out by following the code (which I think is call captured in the TracyWorker.{hpp, cpp} source files , I just need time :)

Here are some hacky implementations of UDP listeners (the first version using the network protocol API in tracy, and another version using Boost.asio) for anybody who wants a starting point.

Here's a CMakeLists.txt to build the UI and muxers all at once. I do this all in a Docker context, but the basic Ubuntu 22.04 pre-requsites are apt install libboost-all-dev libdbus-1-dev libcapstone-dev libglfw3-dev libfreetype-dev before trying anything below.

cmake_minimum_required(VERSION 3.5)
project(tracy_mux)
add_definitions(-DTRACY_ENABLE)

## TRACY CODE ###########################################

# Fetch the core interface library and make available to the next steps
include(FetchContent)
FetchContent_Declare(
  tracy
  GIT_REPOSITORY https://github.com/wolfpld/tracy.git
  GIT_TAG master
  GIT_SHALLOW TRUE
  GIT_PROGRESS TRUE)
FetchContent_MakeAvailable(tracy)
FetchContent_GetProperties(tracy)
message(STATUS "tracy: ${tracy_SOURCE_DIR} ${tracy_BINARY_DIR}")

## TRACY PROFILER UI ####################################### 

# Build the tracy profiler (server and UI)
include(ExternalProject)
ExternalProject_Add(tracy_profiler
  SOURCE_DIR ${tracy_SOURCE_DIR}/profiler/build/unix
  CONFIGURE_COMMAND ""
  BUILD_COMMAND ${CMAKE_COMMAND} -E env LEGACY=1 make -j all
  INSTALL_COMMAND cp ${tracy_SOURCE_DIR}/profiler/build/unix/Tracy-release ${CMAKE_CURRENT_BINARY_DIR}/tracy
  BUILD_IN_SOURCE TRUE)

## TRACY MUXER ########################################### 

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_native tracy_muxer_native.cpp)
target_link_libraries(tracy_muxer_native TracyClient)

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_boost tracy_muxer_boost.cpp)
target_link_libraries(tracy_muxer_boost TracyClient)
target_link_libraries(tracy_muxer_boost ${Boost_LIBRARIES})

One of the strange things about the native version of the UDP listener is that it finds itself ! In other words, when you run it, you see something like this...

ubuntu@mars:~/ros2_ws/src/libtracy_ros2/src/build$ ./tracy_muxer_native 
Starting listener...
Adding client with procName tracy_muxer_native  # <--- weird!

Also, don't be a numpty like me and forget to sudo ufw allow 8086/udp before trying anything above.

asymingt avatar Feb 06 '23 04:02 asymingt

My company is also using Tracy, great work!

We could also benefit a lot for the requested enhancement to support merging the traces of multiple clients in one GUI, especially to profile network latencies.

john-plate avatar Feb 15 '23 15:02 john-plate

Such a feature would also be very useful for, e.g., profiling applications that spawn child processes.

Build systems are one example where it'd be quite nice to have an end-to-end view of the performance timeline across all processes.

topolarity avatar Feb 27 '23 17:02 topolarity

Hi all ! I am joining the team of people that would be interested in a way to collect multiple process traces into a same GUI windows. Ideally, it would be even better to have all processes data in the same capture file, the proxy being an acceptable solution for that.

My company is willing to let me do some work on open-source projects of importance for us, and I'd be happy to contribute here. If you feel like you'd accept a contribution on that topic, I could help. (To be honest, I will surely need some help/guidance on this part to make it happen )

Arpafaucon avatar Feb 14 '24 11:02 Arpafaucon

If you feel like you'd accept a contribution on that topic, I could help.

Sure.

wolfpld avatar Feb 14 '24 11:02 wolfpld

Nice! Can I suggest the following plan?

  • I take some time to read more about the code, better understand what such a change would impact, and assess if I understand enough to do it cleanly. I should be good next week
  • then, would you be OK to take some time helping me figure out a good way to carry the change? (we can definitely do that though this issue, or a dedicated one)
  • on my side I'll have to check with management that they're OK
  • and then coding time for me ^^

Arpafaucon avatar Feb 14 '24 18:02 Arpafaucon

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

I have few bugs to iron out, but I am at a stage where broadcasting clients are automatically adopted, all client events are weaved into single event stream by splitting at ThreadContext boundaries, broadcasting server queries to all clients and picking single most appropriate response.

Edit:

My current progress on the prototype can be found here: https://github.com/cipharius/tracy/blob/feature/multiplex/multiplex/src/multiplex.cpp

And little preview of how it's looking right now: screenshot

I have conviniently hidden the tracy thread zones in that screenshot, because those currently get messed up when new clients connect, still need to figure that out. On Linux I'm not seeing any thread ID conflicts, so I didn't bother creating pseudo IDs yet.

cipharius avatar Mar 05 '24 20:03 cipharius

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

Arpafaucon avatar Mar 29 '24 08:03 Arpafaucon

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

Sure, can try to help, I'll have to update the branch with local changes first.

Though the code being very prototypical and changing a lot, it might be tough to effectively collaborate on it.

The most helpful feedback right now would be testing it out. Right now I'm trying to figure out the last crucial bit of normalising the time between clients such that timeline is correctly displayed. You can try figuring out how time is represented in tracy, but by that time I might have figured out what's going wrong with my current attempts.

The most neutral help would be improving and testing the build scripts, since I only tested on linux and didn't pay too much attention to customising the build scripts. So would be good to see if it builds on windows for example.

cipharius avatar Mar 29 '24 09:03 cipharius

Anyone interested in this feature should have a look at #766.

wolfpld avatar Apr 13 '24 09:04 wolfpld