tracy icon indicating copy to clipboard operation
tracy copied to clipboard

Call stack sampling doesn't seem to work correctly on ryzen 5950x

Open u3shit opened this issue 1 year ago • 4 comments

I have two computers, one with intel i7-4720HQ, and one with ryzen 5950X, both running gentoo linux (and the config should be pretty similar, there might be some differences in the kernel config though). I've tried with current tracy master (225d6c1edec0bf615d1c141e9c0bed0db3712b97), I've noticed that I only have call stack sampling info when running on intel. Here's two screenshots from tracy_test running as root, first on intel intel then on AMD amd On AMD I have zero ghost zones and the dots indicating the samples are also missing from the graph, and I have a not too useful CPU graph. However, if I go to the beginning of the graph, I can find call stack samples at the beginning of the tracy init section, but they disappear if I try to zoom in. e The CPU context switching graph also displays bullshit values: f Any idea what could be wrong? Here's the output of tracy_test, it doesn't seem to report any errors: tracy-amd.txt Here's the output from the intel box: tracy-intel.txt (I don't know why don't I have context switch info here, I got interrupt took too long errors in dmesg here, but I've overridden kernel.perf_event_max_sample_rate and kernel.perf_cpu_time_max_percent, that made the dmesg error go away, but still no context switch capture, but that's not relevant here)

u3shit avatar Aug 04 '22 21:08 u3shit

This is reproducible on WSL.

wolfpld avatar Aug 04 '22 21:08 wolfpld

The direct cause for this is that cap_user_time_zero value is zero, i.e. the kernel does not give us information on how to convert the timestamp to an usable form.

       cap_user_time_zero (since Linux 3.12)
              Indicates the presence of time_zero which allows mapping timestamp values to the hardware clock.

https://github.com/wolfpld/tracy/blob/4607dca13b306eb72662d8356ae7ae80b47b30c2/public/client/TracyRingBuffer.hpp#L105-L112

Originally I have made this check at buffer creation time, because that would be a sensible thing to do. It turned out to be wrong, as the capability may become momentarily unavailable, so now it is checked at every timestamp access. See cfb6d0d2ae8. I do not understand why it has to be this way.

The issue you are reporting is reproducible on WSL with kernel 5.10.102.1-microsoft-standard-WSL2, which was released 23 Feb 2022.

At the same time, I happen to have an hopelessly obsolete system that runs on kernel 5.12.13-zen1-2-zen, which was released 23 Jun 2021. There everything works as expected, as you can see below.

obraz

Something has been changed between these two kernels. I do not have the resources to seek what that was. It would also be good to check if the zen patchset (unrelated to the zen microarchitecture) changes something here.

wolfpld avatar Aug 04 '22 22:08 wolfpld

I've checked with 5.12 vanilla kernel, and call stack stampling works there (so does context switches), so it's definitely something with newer kernels and unrelated to the zen patchset.

u3shit avatar Aug 05 '22 09:08 u3shit

Scrap that, it looks like something else is the problem. After playing around with the old kernel versions, I've reverted to 5.18.11 that didn't work previously, now it work with that kernel too.

u3shit avatar Aug 05 '22 10:08 u3shit

I may have forgotten I knew why this didn't work on WSL.

https://twitter.com/wolfpld/status/1466214828742262795

wolfpld avatar Aug 08 '22 15:08 wolfpld

Oh. Anyway, after messing around the kernels I can no longer reproduce this issue, so maybe it can be closed.

u3shit avatar Aug 08 '22 20:08 u3shit

I see to have the same issue on a AMD Ryzen 7 PRO 3700U. Running sudo tracy_test shows the same trace without stacktrace samples or context switches. I'm on archlinux with a 6.1.15-1-lts kernel. I came to this issue because I couldn't get context switches to display with a real program of mine either, so it might be the CPU/kernel combination?

c-cube avatar Jun 01 '23 04:06 c-cube