hotspot icon indicating copy to clipboard operation
hotspot copied to clipboard

failure to parse COMPRESSED2 record event data

Open abbycin opened this issue 4 months ago • 27 comments

Describe the bug

after click ViewResult it show nothing but a warning message: Sample contained no call frame

To Reproduce Steps to reproduce the behavior:

  1. record data
  2. view result

the command is

/usr/bin/perf record -o /home/abby/perf.data --call-graph dwarf,8192 --event l1i-load-misses,l1d-load-misses,branches,branch-misses,stalled-cycles-frontend -z 

Expected behavior A clear and concise description of what you expected to happen.

Screenshots Image

Version Info (please complete the following information):

  • Linux Kernel version: 6.16.1
  • perf version: 6.16.1
  • hotspot version (appimage? selfcompiled?): v1.5.1 appimage
  • if self-compiled hotspot, what version of elfutils:

Additional context perf report can parse the perf.data that hotspot record

abbycin avatar Aug 28 '25 00:08 abbycin

The summary shows that no samples have been recorded at all. That's because you didn't ask perf the right question :-)

Add cycles to the --event option.

henklaak avatar Sep 01 '25 07:09 henklaak

The summary shows that no samples have been recorded at all. That's because you didn't ask perf the right question :-)

Add cycles to the --event option.

@henklaak it doesn't work

Image

and as I mentioned earlier, perf report can parse the data recorded by hostspot

abbycin avatar Sep 01 '25 23:09 abbycin

set Compression to Disabled, it works now

abbycin avatar Sep 02 '25 08:09 abbycin

Indeed, this is enough to trigger the issue:

perf record --call-graph dwarf --event cycles,instructions -z ls

without -z or when only sampling a single metric, the issue does not occur.

milianw avatar Sep 03 '25 10:09 milianw

https://lore.kernel.org/all/[email protected]/

new compression type that's unhandled as of yet by us. will fix

milianw avatar Sep 07 '25 12:09 milianw

It doesn't seem to work for me:

kf.coreaddons: "Could not find plugin kf6/parts/konsolepart"
feature not properly read PerfHeader::CPU_TOPOLOGY 516 348
unhandled feature PerfHeader::CACHE 2868
unhandled feature PerfHeader::MEM_TOPOLOGY 64
unhandled feature PerfHeader::BPF_PROG_INFO 4
unhandled feature PerfHeader::BPF_BTF 4
unhandled feature PerfHeader::CPU_PMU_CAPS 140
unhandled feature PerfHeader::SAMPLE_TIME 16
Linux version "6.16-1" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 78   PERF_RECORD_EVENT_UPDATE
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   PERF_RECORD_FINISHED_INIT
bad PERF_RECORD_COMPRESSED2 size information 1767480404549108119 818
"The hotspot-perfparser binary exited with code 5 (invalid perf data file)."

I'm using hotspot from newest continuous appimage, perf 6.16-1, linux 6.16.4.zen1-1.

Desour avatar Sep 08 '25 18:09 Desour

@Desour can you provide me with such a perf file, or steps required to reproduce it?

bad PERF_RECORD_COMPRESSED2 size information 1767480404549108119 818

This sounds like the file contains bogus data, or that we are reading completely bogus values somehow.

milianw avatar Sep 09 '25 10:09 milianw

can you provide me with such a perf file, or steps required to reproduce it?

Do perf files contain personal information?

perf record -o perf.data --call-graph dwarf,8192 --aio -z /usr/bin/neofetch

Works fine without -z or --aio. (Idk why I was using aio, maybe I enabled it in the hotspot record gui by accident, or it was on by default.)

(Note: I was using hotspot v1.5.1 before, haven't tested if there's already issues before the last commit.)

Desour avatar Sep 09 '25 11:09 Desour

Do perf files contain personal information?

Possibly file paths and info on your system, and if you use dwarf for unwinding, there are also snippets the stack associated with every sample - so depending on the contents of that and the app you profiled, there may be more data therein.

You can inspect that with strings or perf script (see also perf script --help to enable output of more data).

What distro are you on? And does it also happen with something more readily available than neofetch, like some common tool like find or the like?

milianw avatar Sep 09 '25 12:09 milianw

FTR: On my distro (arch) neofetch is not directly packaged. I used this instead and it works as expected for me:

$ perf record -o perf.data --call-graph dwarf,8192 --aio -z /usr/bin/fastfetch
$ hotspot
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::BPF_PROG_INFO 4
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::CACHE 8808
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::CPU_PMU_CAPS 140
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::SAMPLE_TIME 16
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::BPF_BTF 4
     0.001 debug: unknown[unknown:0]: unhandled feature PerfHeader::MEM_TOPOLOGY 88
     0.001 warning: unknown[unknown:0]: feature not properly read PerfHeader::CPU_TOPOLOGY 1252 892
     0.002 debug: unknown[unknown:0]: Linux version "6.16-1" detected. Switching to automatic buffering.
     0.002 debug: unknown[unknown:0]: unhandled event type 79   PERF_RECORD_TIME_CONV
     0.002 debug: unknown[unknown:0]: unhandled event type 69   PERF_RECORD_ID_INDEX
     0.002 debug: unknown[unknown:0]: unhandled event type 78   PERF_RECORD_EVENT_UPDATE
     0.002 debug: unknown[unknown:0]: unhandled event type 73   PERF_RECORD_THREAD_MAP
     0.002 debug: unknown[unknown:0]: unhandled event type 74   PERF_RECORD_CPU_MAP
     0.002 debug: unknown[unknown:0]: unhandled event type 82   PERF_RECORD_FINISHED_INIT

milianw avatar Sep 09 '25 12:09 milianw

I'm also on arch (last updated 2025-09-05). (I just used neofetch because I wanted to take some arbitrary program. (Looks like I still have it installed from an official arch package that doesn't exist anymore.) /usr/bin/neofetch is just a bash script, fyi.)

For fastfetch, echo, ls, sleep 5, grep bla <an empty file>, and other, I get the

Samples contained no call stack frames. Consider passing --call-graph dwarf to perf record.

error, and following output:

$ perf record -o perf.data --call-graph dwarf,8192 --aio -z -- /usr/bin/fastfetch
[...]
$ hotspot 
kf.coreaddons: "Could not find plugin kf6/parts/konsolepart"
unhandled feature PerfHeader::CPU_PMU_CAPS 140
unhandled feature PerfHeader::BPF_PROG_INFO 4
unhandled feature PerfHeader::MEM_TOPOLOGY 64
unhandled feature PerfHeader::CACHE 2868
unhandled feature PerfHeader::BPF_BTF 4
feature not properly read PerfHeader::CPU_TOPOLOGY 516 348
unhandled feature PerfHeader::SAMPLE_TIME 16
Linux version "6.16-1" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 78   PERF_RECORD_EVENT_UPDATE
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   PERF_RECORD_FINISHED_INIT

$ perf record -o perf.data --call-graph dwarf,8192 --aio -z -- /usr/bin/echo foo
foo
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,005 MB perf.data, compressed (original 0,124 MB, ratio is 29,614) ]
$ hotspot 
kf.coreaddons: "Could not find plugin kf6/parts/konsolepart"
unhandled feature PerfHeader::MEM_TOPOLOGY 64
unhandled feature PerfHeader::CPU_PMU_CAPS 140
feature not properly read PerfHeader::CPU_TOPOLOGY 516 348
unhandled feature PerfHeader::SAMPLE_TIME 16
unhandled feature PerfHeader::BPF_PROG_INFO 4
unhandled feature PerfHeader::BPF_BTF 4
unhandled feature PerfHeader::CACHE 2868
Linux version "6.16-1" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 78   PERF_RECORD_EVENT_UPDATE
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   PERF_RECORD_FINISHED_INIT

I can reproduce the error from before for example with neofetch, leafpad, leafpad --help, pingus, or grep bla /dev/zero.

Maybe it depends on the cpu time used.

Desour avatar Sep 09 '25 14:09 Desour

If you don't want to share your files, can you at least show the (possibly redacted) output of perf script -I --header for a short record of e.g. perf record -z --call-graph dwarf ls?

Oh, but actually I seem to be able to reproduce it now - can you see what happens when you remove the explicit --aio? This used to be implicitly enabled by -z and thus I never paid any attention to it. But have a look at this:

$ perf record --call-graph dwarf -z -- ls
...
[ perf record: Captured and wrote 0.006 MB perf.data, compressed (original 0.093 MB, ratio is 19.866) ]
$ perf script | head
ls    5087   930.512558:          1 cycles:Pu: 
        ffffffffb2a448d7 [unknown] ([unknown])
        ffffffffb2a45197 [unknown] ([unknown])
        ffffffffb272c95a [unknown] ([unknown])
        ffffffffb374829e [unknown] ([unknown])
        ffffffffb24012a6 [unknown] ([unknown])
            7fb6039dfb40 [unknown] (/usr/lib/ld-linux-x86-64.so.2)

ls    5087   930.512571:          1 cycles:Pu: 
        ffffffffb2a55e3f [unknown] ([unknown])

Versus this:

$ perf record --call-graph dwarf -z --aio -- ls
...
[ perf record: Captured and wrote 0.005 MB perf.data, compressed (original 0.101 MB, ratio is 28.747) ]
$ perf script | head
Couldn't decompress data
0x598 [0xe53]: failed to process type: 83 [Operation not permitted]

This now looks like a bug in the kernel / perf to me... Can you check and see if things are working without --aio for you?

milianw avatar Sep 10 '25 07:09 milianw

As said in https://github.com/KDAB/hotspot/issues/736#issuecomment-3270330769, it works without the explicit --aio. Sorry for the miscommunication if I was unclear there!

Works fine without -z or --aio.


I can confirm to get the same outputs as you when using perf script after --aio / not --aio.

Edit: Or almost the same. It says 0x318, not 0x598 on my machine.

Couldn't decompress data
0x318 [0x1428]: failed to process type: 83 [Operation not permitted]

Desour avatar Sep 10 '25 08:09 Desour

I have now reported this to the kernel and linux perf folks. But I fear there's nothing we can do on the hotspot level.

milianw avatar Sep 10 '25 08:09 milianw

There's another way to trigger corruption on the perf record level: https://invent.kde.org/-/snippets/3549

Basically, sometimes using -z -m 16M can also cause this. Sadly, so far I haven't gotten any feedback from the kernel/perf folks about this issue and I don't have the time to dig into this upstream myself right now :(

milianw avatar Sep 12 '25 12:09 milianw

Using the Fedora system version I got a nice recording, but an empty result. Using current appImage I get

kf.coreaddons: "Could not find plugin kf6/parts/konsolepart"
No decoration plugins available. Running with no decorations.
unhandled feature PerfHeader::CACHE 5508
unhandled feature PerfHeader::BPF_PROG_INFO 4
unhandled feature PerfHeader::CPU_PMU_CAPS 140
unhandled feature PerfHeader::SAMPLE_TIME 16
unhandled feature PerfHeader::MEM_TOPOLOGY 72
unhandled feature PerfHeader::BPF_BTF 4
feature not properly read PerfHeader::CPU_TOPOLOGY 884 620
Linux version "6.16.7-200.fc42.x86_64" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 78   PERF_RECORD_EVENT_UPDATE
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   PERF_RECORD_FINISHED_INIT
bad PERF_RECORD_COMPRESSED2 size information 10110251967403258962 331
"The hotspot-perfparser binary exited with code 5 (invalid perf data file)."

Note: the recording ended with

[ perf record: Woken up 7856 times to write data ]
[ perf record: Captured and wrote 235.515 MB ../files/perf.genbig_003500000.data, compressed (original 15723.419 MB, ratio is 66.779) ]

recorded with perf record -o ../files/perf.genbig_$numrek.data --call-graph dwarf,8192 --aio -z --sample-cpu --mmap-pages 4M

Using a smaller recording got simitar results

kf.coreaddons: "Could not find plugin kf6/parts/konsolepart"
unhandled feature PerfHeader::BPF_PROG_INFO 4
feature not properly read PerfHeader::CPU_TOPOLOGY 884 620
unhandled feature PerfHeader::MEM_TOPOLOGY 72
unhandled feature PerfHeader::SAMPLE_TIME 16
unhandled feature PerfHeader::CPU_PMU_CAPS 140
No decoration plugins available. Running with no decorations.
unhandled feature PerfHeader::CACHE 5508
unhandled feature PerfHeader::BPF_BTF 4
Linux version "6.16.7-200.fc42.x86_64" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 78   PERF_RECORD_EVENT_UPDATE
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   PERF_RECORD_FINISHED_INIT
bad PERF_RECORD_COMPRESSED2 size information 12462319115391467767 31
"The hotspot-perfparser binary exited with code 5 (invalid perf data file)."

result file attached

perf.genbig_000010000.data.zip

... and, indeed, dropping -aio leads to a working perfparser.

GitMensch avatar Sep 16 '25 15:09 GitMensch

Try to run perf script on your data file above, doesn't that already complain about it being corrupted too? Meaning, this is the same issue as above and the whole problem resides upstream, outside of our realm?

milianw avatar Sep 17 '25 06:09 milianw

which script do you suggest?

GitMensch avatar Sep 17 '25 07:09 GitMensch

I mean the command perf script literally. https://www.man7.org/linux/man-pages/man1/perf-script.1.html

milianw avatar Sep 17 '25 09:09 milianw

-i was the missing piece...

perf script -i perf.data.no_aio | head
perf-exec  487174 [002] 59149.575324:          1 cycles:P: 
        ffffffff812e11e0 perf_event_addr_filters_exec+0x0 ([kernel.kallsyms])
        ffffffff812f1028 perf_event_exec+0x2d8 ([kernel.kallsyms])
        ffffffff813f21ba begin_new_exec+0x63a ([kernel.kallsyms])
        ffffffff81462e64 load_elf_binary+0x2d4 ([kernel.kallsyms])
        ffffffff813f019a bprm_execve+0x28a ([kernel.kallsyms])
        ffffffff813f1820 do_execveat_common.isra.0+0x1a0 ([kernel.kallsyms])
        ffffffff813f27db __x64_sys_execve+0x3b ([kernel.kallsyms])
        ffffffff81003841 x64_sys_call+0x351 ([kernel.kallsyms])
        ffffffff81f13059 do_syscall_64+0x59 ([kernel.kallsyms])

perf script -i perf.data.aio
Couldn't decompress data
0x4b8 [0x7d14]: failed to process type: 83 [Operation not permitted]

GitMensch avatar Sep 17 '25 09:09 GitMensch

ack, so this is the same as I said above then and nothing we can workaround on the hotspot/perfparser level - the data file is simply corrupted.

milianw avatar Sep 17 '25 13:09 milianw

I'd suggest to pin this issue then, possibly after adjusting its title.

Note: we also do have a workaround for new environments: not use --aio together with -z (which commonly will be the more important one).

Also: would it be reasonable to add something on this to the current checks for finding the best recording options within hotspot? --aio is only used if available (dropped on old systems), maybe that can handle "newest systems" similar?

GitMensch avatar Sep 17 '25 17:09 GitMensch

well, I'd say create a separate bug if you think it helps to have that pinned. I don't think so, as this is an upstream issue that needs to be handled in the kernel after all. I also as such don't know what kernel versions are affected. I wouldn't mind to remove --aio when -z is added for now as a workaround, until we know more. Patches welcome I guess?

milianw avatar Sep 18 '25 07:09 milianw

I can confirm this issue but it's most definitely not a bug in hotspot. When perf record prints the following:

failed to write perf data, error: Bad address
failed to write perf data, error: Bad address
failed to write perf data, error: Bad address

then the output file will be broken with these symptoms:

% perf script -i perf.data >/dev/null
addr2line <path to binary>: could not read first record
0x14cba0 [0]: failed to process type: 83 [Invalid argument]

Edit: By the way I'm not using --aio. Just -z --call-graph dwarf. It doesn't happen everytime, however.

sfan5 avatar Oct 09 '25 09:10 sfan5

I have now reported this to the kernel and linux perf folks.

Link to the report: https://lore.kernel.org/linux-perf-users/4137699.i63VBsrd71@milian-workstation/T/

Desour avatar Oct 09 '25 13:10 Desour

Maybe related to this fix?

The original PERF_RECORD_COMPRESS is not 8-byte aligned, which can cause asan runtime error: ... Since there is no way to align compressed data in zstd compression, ...

jgb-mega-io avatar Nov 10 '25 10:11 jgb-mega-io

Yes, that fix for ASAN added the PERF_RECORD_COMPRESSED2 type where we see that error - and sadly there is no option to record with PERF_RECORD_COMPRESSED as before.

Ideally someone would test if reverting the change of tools/perf/builtin-record.c alone would fix the issue...

GitMensch avatar Nov 10 '25 15:11 GitMensch