htop icon indicating copy to clipboard operation
htop copied to clipboard

Segmentation fault with cpu hotplug

Open purpleidea opened this issue 4 months ago • 5 comments

Take a vm (libvirt+qemu), eg Fedora 42, x86_64 and install htop. In this case it's:

[root@test0 ~]# htop --version
htop 3.4.1

Now hot plug some cpus. htop segfaults.

FATAL PROGRAM ERROR DETECTED
============================
Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
  - Your htop version: '3.4.1'
  - Your OS and kernel version (uname -a)
  - Your distribution and release (lsb_release -a)
  - Likely steps to reproduce (How did it happen?)
  - Backtrace of the issue (see below)

Error information:
------------------
A signal 11 (Segmentation fault) was received.

Setting information:
--------------------
htop_version=3.4.1;config_reader_min_version=3;fields=0 48 17 18 38 39 40 2 46 47 49 1;hide_kernel_threads=1;hide_userland_threads=0;hide_running_in_container=0;shadow_o
ther_users=0;show_thread_names=0;show_program_path=1;highlight_base_name=0;highlight_deleted_exe=1;shadow_distribution_path_prefix=0;highlight_megabytes=1;highlight_thre
ads=1;highlight_changes=0;highlight_changes_delay_secs=5;find_comm_in_cmdline=1;strip_exe_from_cmdline=1;show_merged_command=0;header_margin=1;screen_tabs=1;detailed_cpu_time=0;cpu_count_from_one=0;show_cpu_usage=1;show_cpu_frequency=0;show_cpu_temperature=0;degree_fahrenheit=0;show_cached_memory=1;update_process_names=0;account_guest_in_cpu_meter=0;color_scheme=0;enable_mouse=1;delay=15;hide_function_bar=0;topology_affinity=0;header_layout=two_50_50;column_meters_0=LeftCPUs Memory Swap;column_meter_modes_0=1 1 1;column_meters_1=RightCPUs Tasks LoadAverage Uptime;column_meter_modes_1=1 2 2 2;tree_view=0;sort_key=46;tree_sort_key=0;sort_direction=-1;tree_sort_direction=1;tree_view_always_by_pid=0;all_branches_collapsed=0;screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM TIME Command;.sort_key=PERCENT_CPU;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE PERCENT_SWAP_DELAY PERCENT_IO_DELAY Command;.sort_key=IO_RATE;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;


Backtrace information:
----------------------
htop(CRT_handleSIGSEGV+0x131) [0x55a114c6a531]
/lib64/libc.so.6(+0x1a070) [0x7f1f6e3cf070]
htop(+0x2607) [0x55a114c5e607]
htop(Header_updateData+0x71) [0x55a114c6bbd1]
htop(ScreenManager_run+0x674) [0x55a114c827e4]
htop(CommandLine_run+0x8b8) [0x55a114c68fb8]
/lib64/libc.so.6(+0x3575) [0x7f1f6e3b8575]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f1f6e3b8628]
htop(_start+0x25) [0x55a114c5ddc5]

To make the above information more practical to work with, please also provide a disassembly of your htop binary. This can usually be done by running the following command:

   objdump -d -S -w `which htop` > ~/htop.objdump

Please include the generated file in your report.
Running this program with debug symbols or inside a debugger may provide further insights.

Thank you for helping to improve htop!

Segmentation fault (core dumped)

It's 100% reproducible.

Note that hotunplugging causes no issue, you see those CPU's go "offline". Note that's not technically accurate. Offline vs. online is different than present/missing.

HTH Thanks!

purpleidea avatar Sep 15 '25 01:09 purpleidea

Mind to include the objdump as noted in the crash message? TIA.

BenBE avatar Sep 15 '25 08:09 BenBE

Mind to include the objdump as noted in the crash message? TIA.

Apologies, I didn't get to that. I (possibly incorrectly) assumed this would be an easy reproducer and you'd have a better situation on your own machine. If that's not the case, lmk, and I'll try to get a dump shortly.

purpleidea avatar Sep 15 '25 09:09 purpleidea

While this should be fairly simple to reproduce, it's sometimes better to track the issue directly back from the binary as issues with seemingly identical symptoms sometimes may have different causes. Also having the objdump for the backtrace allows to group similar reports better. Another point in favour of objdumps is that it allows to reproduce the issue even without getting all the details right (e.g. removing a life CPU with one virtualization environment may slightly differ in behaviour from another), but still see the exact code path that was taken (and don't worry, we tracked down bugs from x64, arm, and mips assembly alone already). Also, given that most builds aren't usually debug builds, the backtrace alone skips some essential information, that you can reconstruct based off of the objdump (that's why you sometimes see in bug reports that an offset/alignment¹ is posted).

So yes, please post the objdump for the exact binary that backtrace/crash was triggered with. TIA.

¹Basically the module load offset for the code segment that you need to subtract from the backtrace in order to map addresses to the objdump. From there mapping back to functions mostly is about knowing the rough code structure and how the code usually is laid out by the optimization passes of modern compilers.

BenBE avatar Sep 15 '25 21:09 BenBE

[root@test0 ~]# objdump -d -S -w which htop > ~/htop.objdump objdump: Warning: source file /usr/include/bits/unistd.h is more recent than object file objdump: Warning: source file /usr/include/bits/stdio2.h is more recent than object file objdump: Warning: source file /usr/include/bits/string_fortified.h is more recent than object file objdump: Warning: source file /usr/include/stdlib.h is more recent than object file objdump: Warning: source file /usr/include/bits/fcntl2.h is more recent than object file objdump: Warning: source file /usr/include/wchar.h is more recent than object file objdump: Warning: source file /usr/include/bits/stdlib.h is more recent than object file objdump: Warning: source file /usr/include/sys/sysmacros.h is more recent than object file

GITHUB:

File type .objdump not supported. See the documentation for supported file types.

gzipped...

htop.objdump.gz

purpleidea avatar Sep 15 '25 22:09 purpleidea

Offset: 0x55a114c5c000

There is an assert(existing == currExisting); in LinuxMachine_updateCPUcount, that would normally hit in debug builds.

A quick and dirty hack could be done like this:

diff --git a/CPUMeter.c b/CPUMeter.c
index 69da88db..b32f281b 100644
--- a/CPUMeter.c
+++ b/CPUMeter.c
@@ -226,11 +226,18 @@ static void AllCPUsMeter_getRange(const Meter* this, int* start, int* count) {
    }
 }
 
+static void AllCPUsMeter_done(Meter* this);
+static void CPUMeterCommonInit(Meter* this);
+
 static void AllCPUsMeter_updateValues(Meter* this) {
    CPUMeterData* data = this->meterData;
    Meter** meters = data->meters;
    int start, count;
    AllCPUsMeter_getRange(this, &start, &count);
+   if (data->cpus != (size_t)count) {
+      AllCPUsMeter_done(this);
+      CPUMeterCommonInit(this);
+   }
    for (int i = 0; i < count; i++)
       Meter_updateValues(meters[i]);
 }
@@ -276,9 +283,7 @@ static void CPUMeterCommonUpdateMode(Meter* this, MeterModeId mode, int ncol) {
 static void AllCPUsMeter_done(Meter* this) {
    CPUMeterData* data = this->meterData;
    Meter** meters = data->meters;
-   int start, count;
-   AllCPUsMeter_getRange(this, &start, &count);
-   for (int i = 0; i < count; i++)
+   for (size_t i = 0; i < data->cpus; i++)
       Meter_delete((Object*)meters[i]);
    free(data->meters);
    free(data);
diff --git a/linux/LinuxMachine.c b/linux/LinuxMachine.c
index 188358ef..ff768aaa 100644
--- a/linux/LinuxMachine.c
+++ b/linux/LinuxMachine.c
@@ -123,7 +123,7 @@ static void LinuxMachine_updateCPUcount(LinuxMachine* this) {
 #endif
 
    super->activeCPUs = active;
-   assert(existing == currExisting);
    super->existingCPUs = currExisting;
 }
 

Not sure about possible side effects. That patch is absolutely not tested and I'm not yet sure if it hits all the spots required, because there are potentially some more places that aren't fully aware of CPU hotplugging.

BenBE avatar Sep 19 '25 14:09 BenBE