glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

list_for_each_entry trapped in an endless loop in gf_print_trace

Open chen1195585098 opened this issue 1 year ago • 1 comments

Description of problem: gf_print_trace repeatly writes "frame: type(0) op(0)" to the logfile, size of which eventually reached 8.7T. Checking and debuging through gdb, I found a stack item obtained from ctx->pool->all_frames is self-ringed. That wrong stack item is like: stack->all_frams->next == stack->all_frames->prev == stack->all_frames

I guess there is a race against the stack between STACK_DESTROY and gf_print_trace. However, it seems unreasonable to directly hold a lock during gf_print_trace. Is there any method to avoid this case?

The exact command to reproduce the issue: Not yet reproduced.

The full output of the command that failed:

Expected results:

Mandatory info: - The output of the gluster volume info command:

- The output of the gluster volume status command:

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

Additional info:

- The operating system / glusterfs version: CentOS, latest version of glusterfs.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

chen1195585098 avatar Sep 18 '23 01:09 chen1195585098

one quick possible solution is, having a stack limit of max 100 stack to be dumped in stack dump call? That way, we have latest 100 (or oldest 100?)

amarts avatar Sep 22 '23 08:09 amarts