ibm-spectrum-scale-install-infra icon indicating copy to clipboard operation
ibm-spectrum-scale-install-infra copied to clipboard

Add check memory requirements (Daemon failed to initialize fast condvar rc -1)

Open troppens opened this issue 5 years ago • 0 comments

The Ansible tasks and the manual start of mmfsd fails, if a node has insufficient memory. The Spectrum Scale FAQ says that at least 2GB memory are required. Though working in virtual environments the default memory size of VMs is quite often smaller, in particular in proof-of-concept and demo environments. This might be in particular difficult for a new user who tries the Developer Edition on a laptop with limited resources. I therefore recommend to add a check to the Ansible roles and ideally also to improve the output of mmstartup.

Doing a lot of proof-of-concepts in virtual environments I came across many variations of low memory symptoms. Here is an example on a VM with 256MB only:

TASK [spectrum_scale_core/cluster : cluster | Start daemons] ******************************************************************************************************************************************************
changed: [spectrumscale]

RUNNING HANDLER [spectrum_scale_core/cluster : wait-daemon-active] ************************************************************************************************************************************************
FAILED - RETRYING: wait-daemon-active (10 retries left).
FAILED - RETRYING: wait-daemon-active (9 retries left).
FAILED - RETRYING: wait-daemon-active (8 retries left).
FAILED - RETRYING: wait-daemon-active (7 retries left).
FAILED - RETRYING: wait-daemon-active (6 retries left).
FAILED - RETRYING: wait-daemon-active (5 retries left).
FAILED - RETRYING: wait-daemon-active (4 retries left).
FAILED - RETRYING: wait-daemon-active (3 retries left).
FAILED - RETRYING: wait-daemon-active (2 retries left).
FAILED - RETRYING: wait-daemon-active (1 retries left).
fatal: [spectrumscale]: FAILED! => {"attempts": 10, "changed": false, "cmd": "/usr/lpp/mmfs/bin/mmgetstate -N localhost -Y | grep -v HEADER | cut -d ':' -f 9", "delta": "0:00:01.551573", "end": "2020-04-17 22:35:05.195691", "rc": 0, "start": "2020-04-17 22:35:03.644118", "stderr": "", "stderr_lines": [], "stdout": "down", "stdout_lines": ["down"]}

NO MORE HOSTS LEFT ************************************************************************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************************************************************************
spectrumscale              : ok=81   changed=22   unreachable=0    failed=1    skipped=45   rescued=0    ignored=0

[root@origin ansible]# 

So, let's check on the target node:

[root@origin ansible]# ssh spectrumscale
Last login: Fri Apr 17 22:35:03 2020 from 10.1.1.10

[root@spectrumscale ~]# mmgetstate -a

 Node number  Node name        GPFS state
-------------------------------------------
       1      spectrumscale    down

[root@spectrumscale ~]# mmstartup
Fri Apr 17 22:35:38 CEST 2020: mmstartup: Starting GPFS ...

[root@spectrumscale ~]# mmgetstate -a

 Node number  Node name        GPFS state
-------------------------------------------
       1      spectrumscale    down

[root@spectrumscale ~]#

There are hints in mmfs.log:

2020-04-17_22:29:24.629+0200: [E] Daemon failed to initialize fast condvar rc -1

And in /var/log/messages:

Apr 17 20:35:49 spectrumscale kernel: [E] GPFS cxiInitFastCondvar: kmalloc failed allocating MAX_GPFS_THREADS 16384 * sizeof(FastCondvarThread_t) 64 = 1048576 bytes for FastCondvarThread_t
Apr 17 20:35:49 spectrumscale mmfs[15350]: [E] Daemon failed to initialize fast condvar rc -1
Apr 17 20:35:50 spectrumscale mmremote[15384]: Shutting down!

For quick evaluations you may want to work with VMs which have less than 2GB of memory. Therefore it would be good to add a warning (not an error) to the the Ansible role if the memory is below 2GB and enhance mmstartup to issue an error, if the allocation of memory fails.

Nodes need more than 2GB, depending on the configuration of pagepool for instance, so 2GB is not sufficient for many environments. However a check for 2GB is better than no check at all. The formula for the check can be improved later.

troppens avatar Apr 17 '20 21:04 troppens