cloudera.cluster
cloudera.cluster copied to clipboard
Hitting limit of number of ACLs with XFS file system on CentOS 7
Hello,
this issue happens with the v2 branch and CentOS 7 (7.9 to be specific, fully up to date).
While running task Add ACLs to keystore in file roles/security/tls_generate_csr/tasks/acls.yml, started from playbook prepare_tls.yml, which itself is started from site.yml, we are getting an error when adding an ACL to file /opt/cloudera/security/pki/HOSTNAME_REMOVED.jks.
The error is easily reproduceable without Ansible:
# /bin/setfacl -m group:zookeeper:r /opt/cloudera/security/pki/HOSTNAME_REMOVED.jks
setfacl: /opt/cloudera/security/pki/HOSTNAME_REMOVED.jks: Argument list too long
The reason seems to be that the underlying file system, which is XFS, does not support more than 21 ACIs. This is easily tested as follows.
># uname -r
3.10.0-1160.25.1.el7.x86_64
># pwd
/root
># mount|grep root
/dev/mapper/centos_root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
># touch file
># for g in $(cut -d ':' -f 1 /etc/group) ; do echo "Adding group: ${g}" ; setfacl -m group:${g}:r file ; done
Adding group: root
Adding group: bin
Adding group: daemon
Adding group: sys
Adding group: adm
Adding group: tty
Adding group: disk
Adding group: lp
Adding group: mem
Adding group: kmem
Adding group: wheel
Adding group: cdrom
Adding group: mail
Adding group: man
Adding group: dialout
Adding group: floppy
Adding group: games
Adding group: tape
Adding group: video
Adding group: ftp
Adding group: lock
Adding group: audio
setfacl: file: Argument list too long
Adding group: nobody
setfacl: file: Argument list too long
Adding group: users
setfacl: file: Argument list too long
Adding group: utmp
setfacl: file: Argument list too long
Adding group: utempter
setfacl: file: Argument list too long
Adding group: ssh_keys
setfacl: file: Argument list too long
Adding group: avahi-autoipd
setfacl: file: Argument list too long
Adding group: input
setfacl: file: Argument list too long
Adding group: systemd-journal
setfacl: file: Argument list too long
Adding group: systemd-bus-proxy
setfacl: file: Argument list too long
Adding group: systemd-network
setfacl: file: Argument list too long
Adding group: dbus
setfacl: file: Argument list too long
Adding group: polkitd
setfacl: file: Argument list too long
Adding group: dip
setfacl: file: Argument list too long
Adding group: tss
setfacl: file: Argument list too long
Adding group: postdrop
setfacl: file: Argument list too long
Adding group: postfix
setfacl: file: Argument list too long
Adding group: chrony
setfacl: file: Argument list too long
Adding group: sshd
setfacl: file: Argument list too long
># getfacl file
# file: file
# owner: root
# group: root
user::rw-
group::r--
group:root:r--
group:bin:r--
group:daemon:r--
group:sys:r--
group:adm:r--
group:tty:r--
group:disk:r--
group:lp:r--
group:mem:r--
group:kmem:r--
group:wheel:r--
group:cdrom:r--
group:mail:r--
group:man:r--
group:dialout:r--
group:floppy:r--
group:games:r--
group:tape:r--
group:video:r--
group:ftp:r--
group:lock:r--
mask::r--
other::r--
In summary, adding to many ACLs to the certificate fails. As XFS is the default file system on CentOS 7 as far as I know, I wonder why this problem has not been seen as often as expected. Maybe there are more conditions to be met, in order to hit this problem.
A possible solution would be to create a group, add all those users like zookeeper to that group, and give read permissions to this group to file /opt/cloudera/security/pki/HOSTNAME_REMOVED.jks.
I don't think this counts as a playbook issue, it's an XFSv4 issue. The appropriate fix is to use XFSv5 instead, which has been supported since 2013.
https://access.redhat.com/solutions/2197311
Of cause, I don't mean that is an issue with the playbooks per se. It is just that the README tells us that RHEL 7 is supported and it is not working in the default configuration. As your link reads:
The RHEL6 and the default RHEL7 XFS superblock is the version number 4.
A lot of checks are being done in verify_inventory_and_definition.yml. Maybe add a simple check if the XFS superblock is version 4?
If one deploys using some kind of virtualization, it is not that easy to change the creation of the file system, especially when /opt is located on the rootfs.
So should we check if it's at least v5 and fail, or issue a warning?
Yep, good points - we should probably be checking for v4 - perhaps always issue a warning if v4 is found, but only fail if that particular play (creating home dirs & managing ACLs) is in scope?
@sdairs have you had any time to implement this improvement?
@Chaffelson Negative
This sort of check should be external to cloudera.cluster, e.g. as an optional role available in cloudera.exe. It does not align with the upcoming roadmap and scope of the collection, as the check interacts with non-Cloudera assets.