containers
containers copied to clipboard
Support for Ganglia UI
Hi, just opening an issue for the lack of Ganglia support, despite been written in the docs:
Of course, the minimal requirements listed above do not include Python, R, Ganglia, and many other features that you typically expect in Databricks clusters. To get these features, build off the appropriate base image (that is, databricksruntime/rbase for R), or reference the Dockerfiles in GitHub to determine how to build in support for the specific features you want.
On the other hand the docs for the standard container at least are explicit about the lack of Ganglia support
When it is expected to have ganglia in the container services?
We do not plan on supporting ganglia, since a ~~new cloud-native monitoring solution (cloudwatch in aws, azure log monitor in azure) is being developed~~. I have seen a cloudwatch DCS image that also worked in the short term.
Is there an update on this (particularly for Azure Databricks)? We're using databricks with docker containers for production, and it's really difficult to monitor the cluster, since neither Ganglia nor the web terminal are working if you use a docker container.
What we're trying now is psutil in a different notebook, but that's not really a good option.
@ViaFerrata I worked alongside the Azure Databricks team to find a workaround to this issue. One idea was to use Log Anaytics, but that didn't work with custom docker images in a databricks cluster. After several weeks trying to work through this we had to drop the use of docker inside databricks for our project
@syonekura Thank you for your quick comment :) I'd have also created an Azure ticket for a workaround, so good to know that there seems to be no better one yet.
Dropping docker is not really an option for us, because the code is supplied as a python package for production-readiness (gitlab CI/CD) and also local developing (IDE with debugger etc., the notebook concept is just unusable for larger projects). Hope that there will be a fix some day by Databricks.
Hi guys, I see this Dockerfile setting up Ganglia. Could you recommend a way to make Ganglia UI available when the cluster is terminated? Is Ganglia the recommended monitoring tool? Thanks
@nsi88, did you try that Dockerfile on a Databricks job? Did you successfully collect metrics?
@nsi88, did you try that Dockerfile on a Databricks job? Did you successfully collect metrics?
Hi @jonsnowseven Yes, the Dockerfile works, metrics collected. The problem is with accessing the metrics when the cluster finished its work. We're probably gonna use something like Cloudwatch for that instead of running a Ganglia server separately from the cluster.
Cheers,
Hey, thanks for linking the Dockerfile setting up Ganglia.
I am not sure why you can't access the metrics when the cluster is finished. For me screenshots of the Ganglia UI are saved in the Metrics tab. Maybe you can try setting the environment variable DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES
to a smaller interval. (Documentation)