powertop icon indicating copy to clipboard operation
powertop copied to clipboard

Segmentation fault in report_display_cpu_cstates for Ryzen CPUs

Open Mershl opened this issue 3 years ago ā€¢ 27 comments

PowerTOP version 2.13 Kernel: 5.7.9 (Fedora 32) Hardware: Lenovo T495 (AMD Ryzen 3500U)

Trying to create an HTML report with sudo powertop --html results in the following output:

modprobe cpufreq_stats failedLoaded 421 prior measurements
RAPL device for cpu 0
RAPL device for cpu 0
Devfreq not enabled
glob returned GLOB_ABORTED
Preparing to take measurements
[1]    15086 segmentation fault  sudo powertop --html

The output is identical for a run with --debug --html.

Mershl avatar Jul 24 '20 09:07 Mershl

Works fine for me:

powertop --html
modprobe cpufreq_stats failedLoaded 0 prior measurements
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask f
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask f
Devfreq not enabled
glob returned GLOB_ABORTED
Preparing to take measurements
Taking 1 measurement(s) for a duration of 20 second(s) each.
 the port is sda
PowerTOP outputting using base filename powertop.html

I guess this depends on your hardware. Can you run powertop in gdb or upload a coredump?

jubalh avatar Jul 24 '20 10:07 jubalh

I reckon it's a hardware issue. Tested it on an Lenovo Y50 without problems. My T495 is showing the following stack trace (bt full):

EDIT: backtrace now built with -Og instead of -O2.

#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_set_length (__n=0, 
    this=0x4ef818)
    at /usr/src/debug/gcc-10.1.1-1.fc32.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:217
No locals.
#1  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace (
    this=this@entry=0x4ef818, __pos=__pos@entry=0, __len1=<optimized out>, __s=__s@entry=0x4587b1 "", 
    __len2=__len2@entry=0)
    at /usr/src/debug/gcc-10.1.1-1.fc32.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:470
        __old_size = <optimized out>
        __new_size = 0
#2  0x0000000000415f10 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (
    __s=0x4587b1 "", this=0x4ef818) at /usr/include/c++/10/bits/basic_string.h:901
No locals.
#3  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator= (__s=0x4587b1 "", 
    this=0x4ef818) at /usr/include/c++/10/bits/basic_string.h:676
No locals.
#4  report_display_cpu_cstates () at cpu/cpu.cpp:564
        first_cpu = true
        first_core = false
        num_cpus = <optimized out>
        num_cores = <optimized out>
        pkg_data = 0x8437c8
        core_data = <optimized out>
        cpu_data = <optimized out>
        buffer = "\000 14.1 ms\000\000\000\000\000\000\000   ", '\000' <repeats 93 times>, "\240\001I\000\000\000\000\000\307ӝ\367\377\177\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\300\300\377\377\377\177\000\000\000\252\266\367\377\177\000\000\377\377\377\377\000\000\000\000pĖ\367\377\177\000\000LC_MESSA\000\270\373\340O5\327\323\000\000\000\000\000\000\000\000\361\305"...
        buffer2 = "\000\071\064.1%\000\000\230\t\211\000\000\000\000\000\060\303\377\377\377\177\000\000\330\020\211\000\000\000\000\000<\000\000\000\000\000\000\000\v\000\000\000\000\000\000\000\200\302\377\377\377\177\000\000\265\306C\000\000\000\000\000hkG\000\000\000\000\000\001\000\000\000\000\000\000\000\220\t\211\000\000\000\000\000\005\000\000\000\000\000\000\000\300\301\377\377\377\177\000\000I\215@\000\000\000\000\000\220\301\377\377\377\177\000\000\310\067\204\000\000\000\000\000\300\067\204\000\000\000\000\000\000\000\000\000\n\000\000\000<\000\000\000\000\000\000\000\230\t\211\000\000\000\000\000\\\207E\000\000\000\000\000\b\306E\000\000\000\000\000\061\316E\000\000\000\000\000\261\207E\000\000\000\000\000?\316E\000\000\000\000\000"...
        tmp_num = "5\000re 2\000\000\000\000\000\000\000\000\000\000LC_MESSAGES/powe", '\000' <repeats 17 times>, "\\"
        package = 0
        core = 3
        cpu = <optimized out>
        line = -2
        cstates_num = 2
        title = 9
        core_num = 3
        _package = 0x4928a0
        _core = 0x4955b0
        _cpu = <optimized out>
        core_type = <optimized out>
        div_attr = {css_class = 0x45875c "clear_block", css_id = 0x458754 "cpuidle"}
        std_table_css = {table_class = 0x7fffffffbcd0 "\340\274\377\377\377\177", td_class = 0x7ffff7fdc3b1 <_dl_lookup_symbol_x+289> "H\203\304\060\205\300t\267I\213M", tr_class = 0x7 <error: Cannot access memory at address 0x7>, th_class = 0x7ffff799d808 "\023\064@", pos_table_title = L, title_mod = 0, rows = 0, cols = 0}
        pkg_tbl_size = <optimized out>
        core_tbl_size = <optimized out>
        cpu_tbl_size = <optimized out>
        title_attr = {css_class = 0x45ce31 "content_title", css_id = 0x4587b1 ""}
        idx1 = 8
        idx2 = 24
        idx3 = 61
        tmp_str = "  14.1 ms"
#5  0x000000000040ebcf in one_measurement (seconds=-4, seconds@entry=1, sample_interval=<optimized out>, sample_interval@entry=5, workload=workload@entry=0x0) at main.cpp:265
No locals.
#6  0x000000000040ec39 in make_report (time=20, workload=workload@entry=0x7fffffffd3e0 "", iterations=iterations@entry=1, sample_interval=sample_interval@entry=5, file=file@entry=0x7fffffffc3e0 "powertop.html") at main.cpp:300
No locals.
#7  0x000000000040f6ca in main (argc=2, argv=0x7fffffffe518) at main.cpp:521
        option_index = 5
        c = <optimized out>
        filename = "powertop.html", '\000' <repeats 2683 times>...
        workload = '\000' <repeats 4095 times>
        iterations = 1
        auto_tune = 0
        sample_interval = 5

Mershl avatar Jul 24 '20 10:07 Mershl

Can you try whether the following patch makes a difference for you?

diff --git a/src/cpu/cpu.cpp b/src/cpu/cpu.cpp
index a92f111..dc383d9 100644
--- a/src/cpu/cpu.cpp
+++ b/src/cpu/cpu.cpp
@@ -561,14 +561,14 @@ void report_display_cpu_cstates(void)
 						core_type = _core->get_type();
 						if (core_type != NULL) {
 							if (strcmp(core_type, "Core") == 0 ) {
-								core_data[idx2]="";
+								core_data[idx2]=string("");
 								idx2+=1;
 								snprintf(tmp_num, sizeof(tmp_num), __("Core %d"), _core->get_number());
 								core_data[idx2]=string(tmp_num);
 								idx2+=1;
 								core_num+=1;
 							} else {
-								core_data[idx2]="";
+								core_data[idx2]=string("");
 								idx2+=1;
 								snprintf(tmp_num, sizeof(tmp_num), __("GPU %d"), _core->get_number());
 								core_data[idx2]=string(tmp_num);
@@ -770,7 +770,7 @@ void report_display_cpu_pstates(void)
 					buffer[0] = 0;
 					buffer2[0] = 0;
 					if (line == LEVEL_HEADER) {
-						core_data[idx2]="";
+						core_data[idx2]=string("");
 						idx2+=1;
 						snprintf(tmp_num, sizeof(tmp_num), __("Core %d"), _core->get_number());
 						core_data[idx2]=string(tmp_num);

You need to put this in a file my.patch and then do patch -p1 < my.patch in the main directory. Then rebuild and test.

I'm running it on a T450 without problems.

jubalh avatar Jul 24 '20 10:07 jubalh

The error changed.

Full stack trace attached. Before: 1.txt After: 2.txt

Edit: Build with O0: 2_O0.txt

For the T495: core_tbl_size = {rows = 12, cols = 2} cpu_tbl_size = {rows = 24, cols = 5} core_data has length 24. The segfault happens when trying to access idx2 (24) for core_data which only has index 0..23 in line 564.

num_cpus = 8 // is this intended? The Ryzen 3500U has 1 CPU with 4 cores and 8 threads. num_cores = 4

Mershl avatar Jul 24 '20 11:07 Mershl

core_tbl_size.rows=(cstates_num *_package->children.size()) + _package->children.size();
core_tbl_size.rows+=_package->children.size(); // adding another package sizes core_tbl exactly right on the test systems.

The segmentation fault is fixed by adding an additional package to the core_tbl rows. I was able to reproduce this issue on a Ryzen 1600x and a Ryzen 3500U - potential misinterpretation for Ryzen CPUs?

Mershl avatar Jul 26 '20 12:07 Mershl

When doing this you still need my changes from above mentioned patch or it works without them?

jubalh avatar Jul 27 '20 07:07 jubalh

It works without the string init patches. None the less the patch improves readability IMO.

The culprit are the parameters used for size in https://github.com/fenrus75/powertop/blob/master/src/cpu/cpu.cpp#L494 on a Ryzen CPU. My "hack" does not fix the underlying issue and will define the array (the rows of the table later) too big on previously working CPUs.

Mershl avatar Jul 27 '20 09:07 Mershl

Thank you @Mershl and @jubalh for finding and root cause the issue. We will work on a fix. thanks!

gkammela avatar Jul 27 '20 18:07 gkammela

Hello, I have the same error.

PowerTOP version 2.14-pre Kernel: 5.7.10-zen1-1-zen (Arch Linux) Hardware: Asus VivoBook 15 X505ZA (AMD Ryzen 2500U)

Build parameters: CXXFLAGS='-g -Og' CFLAGS='-g -Og -pthread' Run parameters: '-r --debug'

LLDB backtrace: powertop_sigsegv_lldb.txt GDB backtrace: powertop_sigsegv_gdb.txt

AirDeeEx avatar Jul 30 '20 21:07 AirDeeEx

I'm having the same issue.

powertop 2.13 System76 Serval WS with AMD Ryzen 5 3600 6-Core Processor Manjaro kernel: 5.6.19-2-MANJARO

evanstucker-hates-2fa avatar Aug 06 '20 00:08 evanstucker-hates-2fa

Hi, I have the exact same issue on a Lenovo Thinkpad T495s (Ryzen 7 3700U)

DawidLoubser avatar Mar 24 '21 18:03 DawidLoubser

And another one on a Thinkpad T14 Ryzen 4750U Pro

imp1sh avatar Aug 14 '21 06:08 imp1sh

another one there, HP Pavilion Aero on Ryzen 5 5600U with the same issue šŸ™‹šŸ»ā€ā™€ļø

xhoneybear avatar Sep 05 '21 22:09 xhoneybear

I'm having the same issue on Acer Aspire Ryzen 5.

MathewRomy avatar Sep 13 '21 06:09 MathewRomy

I discovered the same problem on a Gigabyte B550 AORUS ELITE V2 with a Ryzen 5 5600X.

Powertop: 2.14 OS: openSUSE Tumbleweed Kernel: 5.14.x

With "Global C-state Control" disabled in the MB's firmware the segmentation fault does not occur (what was probably to be expected).

gvolt avatar Sep 14 '21 09:09 gvolt

And another one on a Thinkpad T14 Ryzen 4750U Pro

same issue with thinkpad P14s ryzen 7 4750u

drishal avatar Dec 06 '21 08:12 drishal

same issue with ideapad 5 15are05 ryzen 7 4800u

koriwi avatar Jan 21 '22 21:01 koriwi

Any chance of a new release of powertop as the two commits have been merged into git master by now?

dirkmueller avatar Apr 14 '22 10:04 dirkmueller

Looking through the issues, I believe this issue is the same as https://github.com/fenrus75/powertop/issues/34.

I have this issue on a AWS EC2 m6a.xlarge (AMD).

MatthewStadter avatar Apr 25 '22 19:04 MatthewStadter

Same issue on a Thinkpad T14 (gen 1) AMD Ryzen 7 4750U Pro

moalshak avatar Jul 12 '22 18:07 moalshak

I think I'm having the same issue on my Ryzen 5 5600X desktop.

modprobe cpufreq_stats failedCannot load from file /var/cache/powertop/saved_results.powertop
Cannot load from file /var/cache/powertop/saved_parameters.powertop
File will be loaded after taking minimum number of measurement(s) with battery only 
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask 5
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask 5
Devfreq not enabled
glob returned GLOB_ABORTED
Cannot load from file /var/cache/powertop/saved_parameters.powertop
File will be loaded after taking minimum number of measurement(s) with battery only 
Preparing to take measurements
Segmentation fault

DarwinSurvivor avatar Jul 25 '22 20:07 DarwinSurvivor

Hi, I hav the seg-fault iss on servers:

Packages: 0: Intel Xeon Platinum 8259CL Cores: 0: 2 processors (0-1), Intel Sky Lake

eckelj avatar Aug 29 '22 08:08 eckelj