containers icon indicating copy to clipboard operation
containers copied to clipboard

Support for Ganglia UI

Open syonekura opened this issue 4 years ago • 8 comments

Hi, just opening an issue for the lack of Ganglia support, despite been written in the docs:

Of course, the minimal requirements listed above do not include Python, R, Ganglia, and many other features that you typically expect in Databricks clusters. To get these features, build off the appropriate base image (that is, databricksruntime/rbase for R), or reference the Dockerfiles in GitHub to determine how to build in support for the specific features you want.

On the other hand the docs for the standard container at least are explicit about the lack of Ganglia support

When it is expected to have ganglia in the container services?

syonekura avatar Mar 19 '20 14:03 syonekura

We do not plan on supporting ganglia, since a ~~new cloud-native monitoring solution (cloudwatch in aws, azure log monitor in azure) is being developed~~. I have seen a cloudwatch DCS image that also worked in the short term.

evanye avatar May 12 '20 19:05 evanye

Is there an update on this (particularly for Azure Databricks)? We're using databricks with docker containers for production, and it's really difficult to monitor the cluster, since neither Ganglia nor the web terminal are working if you use a docker container.

What we're trying now is psutil in a different notebook, but that's not really a good option.

ViaFerrata avatar Dec 23 '20 12:12 ViaFerrata

@ViaFerrata I worked alongside the Azure Databricks team to find a workaround to this issue. One idea was to use Log Anaytics, but that didn't work with custom docker images in a databricks cluster. After several weeks trying to work through this we had to drop the use of docker inside databricks for our project

syonekura avatar Dec 23 '20 13:12 syonekura

@syonekura Thank you for your quick comment :) I'd have also created an Azure ticket for a workaround, so good to know that there seems to be no better one yet.

Dropping docker is not really an option for us, because the code is supplied as a python package for production-readiness (gitlab CI/CD) and also local developing (IDE with debugger etc., the notebook concept is just unusable for larger projects). Hope that there will be a fix some day by Databricks.

ViaFerrata avatar Dec 23 '20 14:12 ViaFerrata

Hi guys, I see this Dockerfile setting up Ganglia. Could you recommend a way to make Ganglia UI available when the cluster is terminated? Is Ganglia the recommended monitoring tool? Thanks

nsi88 avatar Jul 25 '22 11:07 nsi88

@nsi88, did you try that Dockerfile on a Databricks job? Did you successfully collect metrics?

jonsnowseven avatar Aug 17 '22 12:08 jonsnowseven

@nsi88, did you try that Dockerfile on a Databricks job? Did you successfully collect metrics?

Hi @jonsnowseven Yes, the Dockerfile works, metrics collected. The problem is with accessing the metrics when the cluster finished its work. We're probably gonna use something like Cloudwatch for that instead of running a Ganglia server separately from the cluster.

Cheers,

nsi88 avatar Aug 18 '22 06:08 nsi88

Hey, thanks for linking the Dockerfile setting up Ganglia. I am not sure why you can't access the metrics when the cluster is finished. For me screenshots of the Ganglia UI are saved in the Metrics tab. Maybe you can try setting the environment variable DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES to a smaller interval. (Documentation)

cellularegg avatar Oct 19 '22 06:10 cellularegg