performance.nl-cache=on result in error messages __nlc_inode_clear_entries Assertion failed

Open pbiering opened this issue 1 year ago • 0 comments

Description of problem:

having Negative Lookup cache enabled error messages were shown

The exact command to reproduce the issue:

unclear, related active options are

performance.nl-cache-positive-entry: off
performance.nl-cache: on
performance.nl-cache-timeout: 60

The full output of the command that failed:

trigger is unclear, error message is like

[2024-07-02 20:23:31.323754 +0000] E [nl-cache-helper.c:229:__nlc_inode_clear_entries] (-->/usr/lib64/glusterfs/10.5/xlator/performance/nl-cache.so(+0x7470) [0x7ff0881b7470] -->/usr/lib64/glusterfs/10.5/xlator/performance/nl-cache.so(+0x73b5) [0x7ff0881b73b5] -->/usr/lib64/glusterfs/10.5/xlator/performance/nl-cache.so(+0x6cca) [0x7ff0881b6cca] ) 0-: Assertion failed: To attach gdb and coredump, Run the script under "glusterfs/extras/debug/gfcore.py

Expected results:

not such messges

Mandatory info: - The output of the gluster volume info command:

Volume Name: gfs-MOUNT
Type: Replicate
Volume ID: bb***
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SERVER1:/gfs/brick-MOUNT
Brick2: SERVER2:/gfs/brick-MOUNT
Options Reconfigured:
features.scrub-freq: weekly
features.scrub-throttle: lazy
features.scrub: Active
features.bitrot: on
cluster.nufa: on
network.compression.compression-level: 5
network.compression.min-size: 1024
network.compression: off
auth.allow: 127.0.0.1,SERVER1-IP,SERVER2-IP
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
client.ssl: on
server.ssl: on
ssl.cipher-list: ECDHE-ECDSA-AES128-SHA256
performance.readdir-ahead: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
cluster.rsync-hash-regex: on
performance.md-cache-timeout: 60
network.inode-lru-limit: 200000
performance.qr-cache-timeout: 60
performance.cache-size: 256MB
server.keepalive-count: 5
server.keepalive-interval: 2
server.keepalive-time: 10
performance.nl-cache-positive-entry: off
performance.nl-cache: on
performance.nl-cache-timeout: 60

- The output of the gluster volume status command:

(from SERVER2)

Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick SERVER1:/gfs/brick-MOUNT              55858     0          Y       940  
Brick SERVER2:/gfs/brick-MOUNT              59797     0          Y       889  
Self-heal Daemon on localhost               N/A       N/A        Y       1616 
Bitrot Daemon on localhost                  N/A       N/A        Y       3914 
Scrubber Daemon on localhost                N/A       N/A        Y       4028 
Self-heal Daemon on SERVER1                 N/A       N/A        Y       2711 
Bitrot Daemon on SERVER1                    N/A       N/A        Y       2572 
Scrubber Daemon on SERVER1                  N/A       N/A        Y       2629 
 
Task Status of Volume gfs-MOUNT
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Launching heal operation to perform index self heal on volume gfs-MOUNT has been successful 
Use heal info commands to check status.

- Provide logs present on following locations of client and server nodes -

On request, error log message already shown above

- Is there any crash ? Provide the backtrace and coredump Not seen.

Additional info:

Message disappeared once performance.nl-cache was set to off

performance.nl-cache-positive-entry on or off has no influcence

- The operating system / glusterfs version:

system: EL 9.4 glusterfs: 10.5

Jul 04 '24 05:07 pbiering