pprof
pprof copied to clipboard
Strange mapping at 0 on Mach-O
This is with master as of now on OSX. If you create a test as described here. The code will go to symbolizer.go newMapping(). It will first call obj.Open("myexe", 0x100000000, ...) which is correct because the __TEXT segment is mapped there. However it then also tries to call obj.Open("myexe", 0, ... which fails because the code assumes you are trying to symbolise things in the __TEXT segment. As far as I can see the only thing mapped to 0 is the __PAGEZERO segment.
I'm not really sure what is going on here, but something isn't right. It also doesn't seem to cause any obvious issues but maybe I am overlooking something.
Maybe that call is from https://github.com/google/pprof/blob/cf6d271c80826ef4d30d55512b5a77b115bd4b9c/internal/driver/cli.go#L98 where it sniffs the first argument to decide whether it's a binary or a profile?
If so, perhaps it might make sense to have a separate method to call like o.Obj.Sniff that would cheaply return a boolean whether the file looks like a supported binary instead of actually trying to open it.
It definitely isn't that - it comes from newMapping(). The error is:
Local symbolization failed for resnet: error reading Mach-O file /path/to/./exe: could not identify base for /path/to/./exe: __TEXT segment address (0x100000000) > mapping start address (0x0)
That message is only generated there. Also I added extra printf debugging and it is definitely an entry in prof.Mapping.
I guess the cli call also generates this error but it doesn't print it, so I think that's fine. And yeah I agree it should use magic numbers rather than trying to completely open the file.
My other guess would be that it could be from https://github.com/google/pprof/blob/cf6d271c80826ef4d30d55512b5a77b115bd4b9c/internal/driver/fetch.go#L437.
I added a printf there to check and it doesn't seem to be. Makes sense because there is another mapping in that file. I guess this could be a problem with gperftools itself?
Could be. The profiles generated by gperftools should be in text format, can you take a look at what the mapping record section looks like in the input file?
My profile is in binary format. :-/
What do the mappings look like in pprof -raw <path-to-profile-file>?
Aha!
Mappings
1: 0x7fffa519d000/0x7fffac3ac000/0x200b9000 /usr/lib/libc++abi.dylib
2: 0x104855000/0x1048e2000/0x0 /path/to/a.dylib [FN][FL][IN]
3: 0x0/0xffffffffffffffff/0x0
4: 0x10496a000/0x104972000/0x0 /path/to/a.dylib [FN][FL][IN]
5: 0x10001e000/0x100091000/0x0 /path/to/a.dylib [FN][FL][IN]
6: 0x100000000/0x10000d000/0x0 /path/to/myprogram [FN][FL][IN]
7: 0x7fff97c66000/0x7fff97c6c000/0x0 /usr/lib/system/libdyld.dylib
8: 0x0/0x100000000/0x0 /path/to/myprogram
9: 0x100120000/0x102c12000/0x0 /path/to/a.dylib [FN][FL][IN]
10: 0x7fff97e78000/0x7fff97e81000/0x0 /usr/lib/system/libsystem_platform.dylib
11: 0x104a39000/0x104a49000/0x0 /path/to/a.dylib [FN][FL][IN]
12: 0x7fff97deb000/0x7fff97e0a000/0x0 /usr/lib/system/libsystem_malloc.dylib
13: 0x7fff96853000/0x7fff9687d000/0x0 /usr/lib/libc++abi.dylib
14: 0x1047c9000/0x104800000/0x0 /path/to/a.dylib [FN][FL][IN]
15: 0x10497a000/0x1049e4000/0x0 /path/to/a.dylib [FN][FL][IN]
16: 0x10482a000/0x10483b000/0x0 /path/to/a.dylib [FN][FL][IN]
17: 0x104a5f000/0x104a83000/0x0 /path/to/a.dylib [FN][FL][IN]
18: 0x7fff97e81000/0x7fff97e8c000/0x0 /usr/lib/system/libsystem_pthread.dylib
19: 0x7fff97d80000/0x7fff97da3000/0x0 /usr/lib/system/libsystem_kernel.dylib
20: 0x104acb000/0x104af0000/0x12000 /path/to/a.dylib [FN][FL][IN]
21: 0x104801000/0x10482a000/0x38000 /path/to/a.dylib [FL][IN]
22: 0x100113000/0x100118000/0x8000 /usr/local/lib/libprofiler.0.dylib [FL][IN]
23: 0x7fff97ca0000/0x7fff97d2e000/0x0 /usr/lib/system/libsystem_c.dylib
24: 0x7fff967fc000/0x7fff96853000/0x0 /usr/lib/libc++.1.dylib
25: 0x10010b000/0x100113000/0x0 /usr/local/lib/libprofiler.0.dylib [FN][FL][IN]
26: 0x104ab9000/0x104acb000/0x0 /path/to/a.dylib [FN][FL][IN]
27: 0x7fffa0b7c000/0x7fffa0b7e1ce/0x1fa98000 /usr/lib/system/libsystem_malloc.dylib
28: 0x1048e5000/0x10496a000/0x90000 /path/to/a.dylib [FN][FL][IN]
29: 0x100096000/0x10010b000/0x78000 /path/to/a.dylib [FN][FL][IN]
30: 0x102f51000/0x1047c9000/0x2d57000 /path/to/a.dylib [FN][FL][IN]
31: 0x104b1c000/0x104b21000/0x16000 /path/to/a.1.dylib [FN][FL][IN]
32: 0x7fffa0b78000/0x7fffa0b7a950/0x1fa94000 /usr/lib/system/libsystem_kernel.dylib
33: 0x102c12000/0x102f51000/0x2af2000 /path/to/a.dylib [FN][FL][IN]
(Had to redact the names sorry; all the a.dylibs are different internal libraries that myprogram links to.)
Anyway the problem must be number 8. What do those [FN][FL][IN] things mean? Perhaps the fact that 8 doesn't have them means it shouldn't be loaded or something? 3 also looks suspicious.
Ah apparently [FN] means hasFunctions, [FL] is hasFilenames and [IN] is hasInlineFrames.
@Timmmm Which tool did you use to collect this profile? Is it from gperftools?
Yeah gperftools.
I am still getting this same "error" using a relatively new version of pprof (commit b9804c9) on macOS Big Sur (darwin 11.4). Edit: Same result on most recent commit as well. I am also collecting profiling data from gperftools (version 2.8) and symbolizing using llvm-symbolizer (llvm11). Testing on the file @Timmmm provided above, and also on other c/cpp files. Full output:
$ clang++ -Wall -g -O0 -fno-pie -c main.cpp -o main.o
$ clang++ -o main main.o -lprofiler -fno-pie
$ CPUPROFILE=main.prof ./main
k = 1: 2 3 5 7 11 13 17 19 23 29
k = 2: 4 6 9 10 14 15 21 22 25 26
k = 3: 8 12 18 20 27 28 30 42 44 45
k = 4: 16 24 36 40 54 56 60 81 84 88
k = 5: 32 48 72 80 108 112 120 162 168 176
k = 6: 64 96 144 160 216 224 240 324 336 352
k = 7: 128 192 288 320 432 448 480 648 672 704
k = 8: 256 384 576 640 864 896 960129613441408
k = 9: 512 76811521280172817921920259226882816
k = 10:1024153623042560345635843840518453765632
k = 11:2048307246085120691271687680103681075211264
k = 12:40966144921610240138241433615360207362150422528
k = 13:8192122881843220480276482867230720414724300845056
k = 14:16384245763686440960552965734461440829448601690112
k = 15:32768491527372881920110592114688122880165888172032180224
k = 16:6553698304147456163840221184229376245760331776344064360448
k = 17:131072196608294912327680442368458752491520663552688128720896
k = 18:262144393216589824655360884736917504983040132710413762561441792
k = 19:52428878643211796481310720176947218350081966080265420827525122883584
k = 20:1048576157286423592962621440353894436700163932160530841655050245767168
PROFILE: interrupts/evictions/bytes = 812/334/22864
$ PPROF_BINARY_PATH=./main
$ pprof -text -trim=false main.prof
Local symbolization failed for libdyld.dylib: stat /usr/lib/system/libdyld.dylib: no such file or directory
Local symbolization failed for main: error reading Mach-O file /<path/to/root/folder>/./main: could not identify base for /<path/to/root/folder>/./main: __TEXT segment address (0x100000000) > mapping start address (0x0)
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: libc++.1.dylib
Type: cpu
Showing nodes accounting for 8.12s, 100% of 8.12s total
flat flat% sum% cum cum%
7.99s 98.40% 98.40% 7.99s 98.40% k_prime
0.03s 0.37% 98.77% 7.14s 87.93% primes
0.03s 0.37% 99.14% 0.03s 0.37% std::__1::__compressed_pair_elem::__get
0.02s 0.25% 99.38% 0.02s 0.25% [libsystem_kernel.dylib]
0.02s 0.25% 99.63% 0.05s 0.62% std::__1::__list_imp::__sz
0.02s 0.25% 99.88% 0.05s 0.62% std::__1::list::size
0.01s 0.12% 100% 0.01s 0.12% std::__1::locale::use_facet
0 0% 100% 7.15s 88.05% <unknown>
0 0% 100% 0.03s 0.37% ProfileHandler::SignalHandler
0 0% 100% 7.15s 88.05% [libdyld.dylib]
0 0% 100% 5.47s 67.36% [libsystem_platform.dylib]
0 0% 100% 0.22s 2.71% [main]
0 0% 100% 7.15s 88.05% main
0 0% 100% 0.03s 0.37% std::__1::__compressed_pair::first
0 0% 100% 0.01s 0.12% std::__1::basic_ostream::operator<<
We can ignore the warning for setting PPROF_BINARY_PATH (note this was already done for the local executable). This warning is just reiterating that we do not have symbol information for some binaries (e.g libdyld.dylib). This occurs in Big Sur because Apple changed the filesystem such that *.dylib files can no longer be searched as files within the system. Regardless, symbolizing those libraries won't really matter as we will likely not be optimizing them anyways.
@Timmmm @aalexand Have you guys figured out if the problem is, in fact, with gperftools, and not pprof? If so, are there other profiling tools that work with pprof that do not result in weird symbolization errors. If not, then I might try attempting to see if this persists using other profiling tools.
Ok, by compiling with PIE and ASLR enabled (as suggested in https://github.com/google/pprof/issues/311#issuecomment-363398950), I no longer get the warning that symbolization failed on the local binary ./main. This work on both latest HEAD or commit b9804c9 of pprof. However, I am still seeing <unknown>'s in the pprof output:
Local symbolization failed for libdyld.dylib: stat /usr/lib/system/libdyld.dylib: no such file or directory
Local symbolization failed for libsystem_c.dylib: stat /usr/lib/system/libsystem_c.dylib: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: libc++.1.dylib
Type: cpu
Showing nodes accounting for 9.13s, 100% of 9.13s total
flat flat% sum% cum cum%
9.01s 98.69% 98.69% 9.01s 98.69% k_prime
0.04s 0.44% 99.12% 7.47s 81.82% primes
0.03s 0.33% 99.45% 0.03s 0.33% std::__1::locale::use_facet
0.02s 0.22% 99.67% 0.04s 0.44% std::__1::list::size
0.01s 0.11% 99.78% 0.01s 0.11% [libsystem_kernel.dylib]
0.01s 0.11% 99.89% 0.01s 0.11% std::__1::__compressed_pair::first
0.01s 0.11% 100% 0.01s 0.11% std::__1::__compressed_pair_elem::__get
0 0% 100% 7.50s 82.15% <unknown>
0 0% 100% 7.50s 82.15% [libdyld.dylib]
0 0% 100% 0.01s 0.11% [libsystem_c.dylib]
0 0% 100% 3.65s 39.98% [libsystem_platform.dylib]
0 0% 100% 7.50s 82.15% main
0 0% 100% 0.02s 0.22% std::__1::__list_imp::__sz
0 0% 100% 0.01s 0.11% std::__1::__stdoutbuf::overflow
0 0% 100% 0.04s 0.44% std::__1::basic_ostream::operator<<
0 0% 100% 0.01s 0.11% std::__1::basic_ostream::put
0 0% 100% 0.01s 0.11% std::__1::endl
Are these <unknown> part of the *.dylib's or something else? How can one check? Since these shared libraries are no longer present as files in MacOS, could it be that llvm-symbolizer/addr2line cannot recognize them and therefore returns <unknown>.