tracy icon indicating copy to clipboard operation
tracy copied to clipboard

Possibility to improve macOS support?

Open Alzathar opened this issue 4 years ago • 10 comments

First of all, I would say thank you for your amazing job! I always wanted to profile my code and never find the Swiss Army knife for that (CPU / GPU / memory / performance per line of code).

I read the manual and look in the code of the master branch to understand the limitations/restrictions for macOS.

I would like to have your opinion about the following points to know if they can be improved/fixed:

  • TRACY_NO_EXIT/ Profiling is interrupted when the application exits : Is this a limitation in the macOS architecture or do you think it might have a workaround/improvement to fix this?
  • Crash handler not available: Is it because this is not a priority for you (due to the limitation below plus the problem with OpenGL which can be profiled), or is it for any other reason?

Alzathar avatar Apr 21 '20 13:04 Alzathar

TRACY_NO_EXIT/ Profiling is interrupted when the application exits : Is this a limitation in the macOS architecture or do you think it might have a workaround/improvement to fix this?

See 3.3 in http://team.pld-linux.org/~wolf/techdoc.pdf. tl;dr: not possible.

Crash handler not available: Is it because this is not a priority for you (due to the limitation below plus the problem with OpenGL which can be profiled), or is it for any other reason?

The current OSX support is mainly a side effect of iOS implementation. It makes little sense to catch crashes on iOS, where there is no local symbol information to retrieve, in order to build a readable call stack.

wolfpld avatar Apr 21 '20 13:04 wolfpld

Thanks for the answer.

I am not an expert with the __attribute__ keyword and I never used the init_priority attribute prior this issue. However, from my tests under macOS 10.15.4 with XCode 11.4.1, It seems it is now supported and it works correctly.

The following code compiles and I get the expected results.

#define CATCH_CONFIG_MAIN
#include <catch.hpp>

struct Foo {
  Foo() : A(S++) {};
  static int S;
  int A;
};

int Foo::S = 0;
Foo a __attribute__ ((init_priority(2000)));
Foo b __attribute__ ((init_priority(1000)));

TEST_CASE("__attribute__((init_priority(X)))") {
  CHECK( Foo::S == 2 );
  CHECK( a.A == 1 );
  CHECK( b.A == 0 );
}

Moreover, if I change the code in Tracy (see TracyProfiler.cpp:73) to the code below, I am able to compile TracyClient.cpp.

#ifdef __APPLE__
#  if __clang_major__ >= 11
#    define init_order( val ) __attribute__ ((init_priority(val)))
#  else
#  define TRACY_DELAYED_INIT
#  endif
#else
#  ifdef __GNUC__
#    define init_order( val ) __attribute__ ((init_priority(val)))
#  else
#    define init_order(x)
#  endif
#endif

With this code, the macro init_order is defined if Clang 11 or greater is used under macOS. However I have no example to test the modification. Have you an example that uses the TRACY_NO_EXIT macro?

The log below is the version of Apple Clang installed on my computer

% clang --version
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Alzathar avatar Apr 21 '20 20:04 Alzathar

Yeah, that doesn't really mean anything. The code may appear to be working, but it may just be an artifact of the specific link order. The functionality would break should the objects be passed to the linker in a different order (yay Apple!). You would be able to notice the breakage only if specific conditions are met.

This can be acted upon only if there is a concrete documentation about proper support on the Apple side. Sorry, simple test cases won't be enough here.

wolfpld avatar Apr 21 '20 20:04 wolfpld

I understand your points. On the other hand, if TracyClent is compiled in the executable to profile without any shared library. this should not never break, does it? I understand this kind of scenario is not interesting (and rarely used in real project), but this is only to validate my comprehension of the issue with the use of the TRACY_NO_EXIT define under macOS.

I continued to evaluate the possibility to use TRACY_NO_EXIT under macOS. Instead of a unit test, I created an executable to profile with Tracy. This executable requires to use the TRACY_NO_EXIT define because it only wait during few hundreds of milliseconds before leaving. The source code is below.

#include <chrono>
#include <thread>

#include <Tracy.hpp>

int main(int argc, const char **argv) {
  const std::string app_info{"quick example for Tracy"};
  TracyAppInfo(app_info.data(), app_info.size());
  TracyMessageL("Starting the application");
  TracyMessageL("Set jobs");
  auto j1 = std::thread([](){
    tracy::SetThreadName("First job");
    {
      ZoneScoped
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    {
      ZoneScoped
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
  });
  auto j2 = std::thread([](){
    tracy::SetThreadName("Second job");
    ZoneScoped
    std::this_thread::sleep_for(std::chrono::milliseconds(400));
  });
  TracyMessageL("Sleep a little bit");
  std::this_thread::sleep_for(std::chrono::milliseconds(300));
  TracyMessageL("Wait for the jobs to finish");
  j1.join();
  j2.join();
  TracyMessageL("Stopping the application");
  return 0;
}

Using the original source code of Tracy and compiled with TRACY_ENABLE and TRACY_NO_EXIT, the executable crashes due to a segmentation fault (null heap in tracy::_memory_allocate_small when using the function tracy::Profiler::MessageAppInfo).

However, when I use the proposed modification in my previous post, this works like a charm! The executable is waiting. I am able to use capture to dump the trace and I can visualize it into profilter. The screenshot below shows you the results.

Screen Shot 2020-04-22 at 12 21 19 AM Screen Shot 2020-04-22 at 12 20 49 AM

I also attached the trace file of the example in a ZIP as it may give you more information example_no_exit_macos.zip

The trace file was generated using Tracy 0.6.11 (commit 865593146ab71d7fb9da30a01c6482fa6ad10b18). Does it miss things?

Alzathar avatar Apr 22 '20 05:04 Alzathar

On the other hand, if TracyClent is compiled in the executable to profile without any shared library. this should not never break, does it? I understand this kind of scenario is not interesting (and rarely used in real project)

This is the exact case I am talking about.

wolfpld avatar Apr 22 '20 09:04 wolfpld

Crash handler not available: Is it because this is not a priority for you (due to the limitation below plus the problem with OpenGL which can be profiled), or is it for any other reason?

Current blocker for this is lack of functionality for listing threads in process on OSX.

wolfpld avatar Jun 23 '20 19:06 wolfpld

Using the proposed modification of the function init_order and the code in this gist, I created three examples under macOS using the latest commit published in the master branch (7cf3b0b0044bc32517b0b402f2f97613b8f526af):

  • binary only
  • binary linked to a static library
  • binary linked to a shared library

For each of them, I launched the example binary in a terminal and in another terminal I launched the capture binary. In all cases, the example binary waits until the capture binary is able to get the data. No crash happened. I got the same visuals than the ones posted previously for the output of capture or the visualization of the traces in profiler.

Would that mean the initialization of Tracy can be done correctly under macOS with clang version >= 11 and it is no more necessary to force the usage of the TRACY_DELAYED_INIT macro? Would I do other tests?

Alzathar avatar Aug 20 '20 15:08 Alzathar

@wolfpld would it be possible to have a function in client header, such as TracyFlush(), which would simply wait until all pending data was sent to the server? User would have a choice to call it at the end of main() (or from some global destructor). This would at least alleviate the issue on MacOS and result in losing only the very last frames generated after main(). At the moment I'm seeing a situation where frames stop being transferred halfway through the program due to the connection being torn down. The workaround discussed here works for me, but having the ability to flush could potentially be a better workaround without the need to modify Tracy source code locally.

cdragan avatar Apr 25 '21 15:04 cdragan

This may do what you want:

GetProfiler().RequestShutdown();
while( !GetProfiler().HasShutdownFinished() ) { std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) ); };

wolfpld avatar Apr 25 '21 16:04 wolfpld

Thanks a lot, I'll give it a shot.

cdragan avatar Apr 26 '21 10:04 cdragan