htop icon indicating copy to clipboard operation
htop copied to clipboard

Process tree loop is possible with some kernels

Open Low-power opened this issue 3 years ago • 14 comments

Some ptrace(2) implementations have a behavior of 're-parenting' the traced process to the tracer, which may causing problem with htop 'tree view'.

For example, to create such a loop on FreeBSD:

Use gdb(1) to attach the current shell (bash(1)) in a terminal (pts/46):

$ tty
/dev/pts/46
$ echo $$
3248
$ gdb -p $$
GNU gdb (GDB) 7.11.1 [GDB 7.11.1 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...
Attaching to process 3248
Reading symbols from /usr/local/bin/bash...(no debugging symbols found)...done.
Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
Reading symbols from /usr/local/lib/libintl.so.8...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
[Switching to LWP 104322 of process 3248]
0x0000000800e409f8 in _wait4 () from /lib/libc.so.7
(gdb) 

Inspect processes associated with the previous terminal using ps(1), in another terminal session:

$ tty
/dev/pts/42
$ ps -t pts/46 -o pid,ppid,comm
 PID PPID COMMAND
3248 6999 bash
6999 3248 gdb-7.11.1

Note the 2 processes has formed a loop.

htop crashing messages:

>>>>>>>>>> stderr output >>>>>>>>>>
Assertion failed: (Vector_size(this->displayList) == vsize), function ProcessList_buildTree, file ProcessList.c, line 316.

<<<<<<<<<< stderr output <<<<<<<<<<


FATAL PROGRAM ERROR DETECTED
============================
Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
  - Your htop version: '3.2.0'
  - Your OS and kernel version (uname -a)
  - Your distribution and release (lsb_release -a)
  - Likely steps to reproduce (How did it happen?)
  - Backtrace of the issue (see below)

Error information:
------------------
A signal 6 (Abort trap) was received.

Setting information:
--------------------
htop_version=3.2.0;config_reader_min_version=3;fields=0 48 17 18 38 39 2 46 47 49 1;hide_kernel_threads=1;hide_userland_threads=0;shadow_other_users=0;show_thread_names=1;show_program_path=1;highlight_base_name=0;highlight_deleted_exe=1;highlight_megabytes=1;highlight_threads=1;highlight_changes=0;highlight_changes_delay_secs=5;find_comm_in_cmdline=1;strip_exe_from_cmdline=1;show_merged_command=0;header_margin=1;screen_tabs=0;detailed_cpu_time=0;cpu_count_from_one=1;show_cpu_usage=1;show_cpu_frequency=0;show_cpu_temperature=0;degree_fahrenheit=0;update_process_names=0;account_guest_in_cpu_meter=0;color_scheme=0;enable_mouse=1;delay=15;hide_function_bar=0;header_layout=two_50_50;column_meters_0=AllCPUs;column_meter_modes_0=1;column_meters_1=Tasks LoadAverage Uptime Blank Memory Swap;column_meter_modes_1=2 2 2 2 1 1;tree_view=1;sort_key=47;tree_sort_key=0;sort_direction=1;tree_sort_direction=1;tree_view_always_by_pid=0;all_branches_collapsed=0;screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT STATE PERCENT_CPU PERCENT_MEM TIME Command;.sort_key=PERCENT_MEM;.tree_sort_key=PID;.tree_view=1;.tree_view_always_by_pid=0;.sort_direction=1;.tree_sort_direction=1;.all_branches_collapsed=0;

Backtrace information:
----------------------
0x407bd0 <print_backtrace+0x14> at /usr/home/WHR/src/htop-dev-htop/htop
0x407f60 <CRT_handleSIGSEGV+0xaf> at /usr/home/WHR/src/htop-dev-htop/htop

To make the above information more practical to work with, please also provide a disassembly of your htop binary. This can usually be done by running the following command:

   objdump -d -S -w `which htop` > ~/htop.objdump

Please include the generated file in your report.
Running this program with debug symbols or inside a debugger may provide further insights.

Thank you for helping to improve htop!

This commit adds a rescan logic, which will rescan for looped processes once there appears to be one or more loops.

Low-power avatar May 17 '22 08:05 Low-power

Note this ptrace(2) behavior of FreeBSD kernel is documented in the man page:

For the duration of the tracing session, the traced process will be ``re-parented'', with its parent process ID (and resulting behavior) changed to the tracing process.

Low-power avatar May 17 '22 09:05 Low-power

I'd suggest marking all processes as unseen in the root finding phase before sort by setting tree_depth to a guard value like UINT_MAX and checking processes for that guard value instead.

Good idea; however I think setting tree_depth to 0 is enough, because for a process having both !process->isRoot && process->tree_depth == 0 is enough to determine that it is out of the newly built tree(s).

You may also want to chase to loop in case the process you encounter first is not one of the loop processes, but a descendant of one of them.

I currently have no idea how to handle it efficiently.

Low-power avatar May 17 '22 12:05 Low-power

You may also want to chase to loop in case the process you encounter first is not one of the loop processes, but a descendant of one of them.

I currently have no idea how to handle it efficiently.

Start from the first missing process you see. As long as the parent of the current process has not been marked as visited, you mark the current process as visited and switch to the parent. When you see that the parent has already been marked as seen, both the current process and the parent are known to belong to the loop body, so you can start building the tree from either. Continue the scan for missing processes in case there are more missing processes.

This discovery step is still O(n) for all the processes because every process can be marked as visited at most once (and thus dominated by O(n * log(n)) for building the trees).

tanriol avatar May 17 '22 12:05 tanriol

I think this should use Process_getParentPid for consistency with the tree sort code.

Yep, I missed this.

The outer for loop iterates for each looped process tree, while the inner do loop iterates for each process in a tree to findout where to cut the looped tree.

j counts the remaining lost nodes; when it becomes 0, there shouldn't be any lost nodes, so break out the outer loop to prevent wasting time to find more lost nodes.

j-- is inside the inner loop bacause all processes iterated here are expected to be added to displayList later by ProcessList_buildTreeBranch. Yes, this isn't perfect, as it is also possible to have an deeper descendant didn't count to j but picked up by ProcessList_buildTreeBranch, but it was used in hope to speed up things a litttle bit.

Other checks you mentioned are sanity checks.

Low-power avatar May 17 '22 15:05 Low-power

j counts the remaining lost nodes; when it becomes 0, there shouldn't be any lost nodes, so break out the outer loop to prevent wasting time to find more lost nodes.

How about something like if(Vector_size(this->displayList) == vsize) break; after building the branch for the current loop instead?

tanriol avatar May 17 '22 16:05 tanriol

How about something like if(Vector_size(this->displayList) == vsize) break; after building the branch for the current loop instead?

Makes sence, Vector_size is cheap.

Low-power avatar May 17 '22 16:05 Low-power

Added assert(3)s.

Low-power avatar May 17 '22 16:05 Low-power

Looks good to me from the content POV. Sorry, I'm not familiar enough with the styleguide to review code style, so I'll leave that to @BenBE or someone else from the main team.

tanriol avatar May 17 '22 16:05 tanriol

Another performance consideration affecting this code affects the process parent lookup around line 328: Have you checked if looking up that process object using the process hashtable is viable? If so, search might be considerably faster, as no linear search has to be performed comparing process IDs.

ProcessList_findProcess already uses hashtable, so this one should be fine.

tanriol avatar May 17 '22 17:05 tanriol

Trying this PR locally and there seems to be some problem I don't fully understand yet.

tanriol avatar May 17 '22 18:05 tanriol

@Low-power Sorry, I missed that you're missing the other part - ProcessList_buildTreeBranch does not have a built-in check for "this process hasn't been added yet", so it would go into infinite recursion as we don't re-sort the process list. As you mark these new roots as roots, skipping any process with isRoot should do the trick.

tanriol avatar May 17 '22 18:05 tanriol

Sorry, I missed that you're missing the other part - ProcessList_buildTreeBranch does not have a built-in check for "this process hasn't been added yet", so it would go into infinite recursion as we don't re-sort the process list. As you mark these new roots as roots, skipping any process with isRoot should do the trick.

I don't think I understand this, but shouldn't marking a node as root (isRoot = true) be enough to prevent a infinite recursion?

Low-power avatar May 18 '22 04:05 Low-power

Sorry, I missed that you're missing the other part - ProcessList_buildTreeBranch does not have a built-in check for "this process hasn't been added yet", so it would go into infinite recursion as we don't re-sort the process list. As you mark these new roots as roots, skipping any process with isRoot should do the trick.

I don't think I understand this, but shouldn't marking a node as root (isRoot = true) be enough to prevent a infinite recursion?

Yeah, that's the check I'm suggesting to add. Before this PR that was not needed because of a special case in compareProcessByKnownParentThenNatural sorting all roots as if their parent PID was 0 instead.

tanriol avatar May 18 '22 05:05 tanriol

@Low-power Could you please update your PR to fix the issues that people reported? TIA.

BenBE avatar Jun 01 '22 16:06 BenBE