glusterfs
glusterfs copied to clipboard
list_for_each_entry trapped in an endless loop in gf_print_trace
Description of problem: gf_print_trace repeatly writes "frame: type(0) op(0)" to the logfile, size of which eventually reached 8.7T. Checking and debuging through gdb, I found a stack item obtained from ctx->pool->all_frames is self-ringed. That wrong stack item is like: stack->all_frams->next == stack->all_frames->prev == stack->all_frames
I guess there is a race against the stack between STACK_DESTROY and gf_print_trace. However, it seems unreasonable to directly hold a lock during gf_print_trace. Is there any method to avoid this case?
The exact command to reproduce the issue: Not yet reproduced.
The full output of the command that failed:
Expected results:
Mandatory info:
- The output of the gluster volume info
command:
- The output of the gluster volume status
command:
- The output of the gluster volume heal
command:
**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump
Additional info:
- The operating system / glusterfs version: CentOS, latest version of glusterfs.
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
one quick possible solution is, having a stack limit of max 100 stack to be dumped in stack dump call? That way, we have latest 100 (or oldest 100?)