Data Node: Improve error visibility
Setting up the data node should help the user to identify errors and give them a chance to fix them.
What?
While setting up a data node, I ran into a common problem of having another process already running on port 9200.
The logs of the data-node show BindHttpException[Failed to bind to 0.0.0.0:9200]; nested: BindException[Address already in use]
However, the UI gives no hint of what is actually happening, or how to recover from that error:
It would be great if we could show the error logs in the UI, or give the user an idea how he can retry the deployment.
Your Environment
- Graylog Version: 5.2.0-beta.3
Currently, the errors shown are only from the provisioning process. The OpenSearch startup happens as a totally independent process, and errors during OpenSearch startup can also happen during regular (later) starts of the DataNode and should not be written into the preflight data structures then. And not all exceptions in the OpenSearch logs are actually errors that break the start of OpenSearch. So it looks to be a bit more complicated to find a solution here that exceeds showing In case of errors, check the DataNode logs.
Error visibility was improved in other PRs and also we are showing logs now.