cryostat-legacy
cryostat-legacy copied to clipboard
[Task] Startup-blocking exceptions should log detailed messages
We have some exceptions that are intentionally emitted when certain preconditions, like filesystems permissions checks, fail. These are generally implemented as a RuntimeException
thrown by a @Provides
method in various DI module classes, but not always.
When these are thrown the application startup fails and the exception's stack trace is printed. It would be helpful for users and developers if these logs also contained a descriptive message about what check was being performed, why, what the expected state was vs. what was observed, and possible reasons for the failure.
One example of such an exception currently:
INFO: Selected OpenShift Platform Strategy
Exception in thread "main" java.nio.file.AccessDeniedException: /opt/cryostat.d/conf.d/credentials
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432)
at java.base/java.nio.file.Files.newDirectoryStream(Files.java:472)
at java.base/java.nio.file.Files.list(Files.java:3699)
at io.cryostat.core.sys.FileSystem.listDirectoryChildren(FileSystem.java:98)
at io.cryostat.configuration.CredentialsManager.load(CredentialsManager.java:91)
at io.cryostat.Cryostat.main(Cryostat.java:79)
> I think I've figured it out. I wouldn't really consider it a bug in this case
> My QuickLab cluster has 4 NFS PVs, those had all been bound by Cryostat installs at one point, so I released them by deleting the claimRef field on the PV. This didn't actually delete any data on the PVs, so there were
still cryostat conf.d directories in each of them.
> When these got reused by another Cryostat process, the UID is different than the owner of the credentials directory, which caused the permission error
> Users should clean up the PVs before making them available for use again
In this case, a message describing 1) the need for Cryostat to have rwx
permissions on the specified directory so that it can list, read, and write files within to persist JMX credentials 2) the actual permissions and UID/GID of the directory 3) Cryostat's UID/GID 4) the possible reasons being that the permissions have been changed by another process or the cluster, or that the Cryostat instance has been redeployed and its UID/GID has changed compared to a previous instance that had the same mounted storage volume.