munin icon indicating copy to clipboard operation
munin copied to clipboard

munin-graph scalability

Open bra-fsn opened this issue 7 years ago • 13 comments

Hi,

What can be done, if munin-graph and -html takes too much time to run? According to the docs, there is max_html_jobs and max_graph_jobs, but it seems they are disabled in 2.0.30. After looking into the code, the fork code is completely useless, because it forks the graph generating code, then waits for it to complete. So at one time at most one process will do the work, so the whole process will be slower significantly (with the overhead of the fork). How can I use multiple CPUs to generate these?

bra-fsn avatar Feb 21 '17 11:02 bra-fsn

Maybe you could consider switching to CGI instead so you wouldn't have to update the graphics and HTML files on every run.

bauerj avatar Feb 22 '17 18:02 bauerj

That's the whole point of the 3.0 rewriting :-)

quentin-st avatar Feb 22 '17 19:02 quentin-st

I have hosts with a lot of graphs. Some years ago, I tried the CGI, but I couldn't wait the time needed for generating the graphs realtime (even downloading the pre-generated files take a lot of seconds).

bra-fsn avatar Feb 22 '17 19:02 bra-fsn

I have hosts with a lot of graphs.

How many?

even downloading the pre-generated files take a lot of seconds

You may have network speed issues then :confused: For now, the new UI won't make any improvement for you since it will still download the same amount of graphs, but we may look into it in the future. It will avoid generating all the graphs / HTML pages every 5 minutes though.

quentin-st avatar Feb 22 '17 20:02 quentin-st

Around 1500 ATM.

bra-fsn avatar Feb 22 '17 20:02 bra-fsn

That's actually a lot indeed. We may add a proper lazy loading in the future. What's your usual use case? (browsing the whole page / searching for specific graphs or categories)

quentin-st avatar Feb 22 '17 20:02 quentin-st

For me it would be perfectly fine, if I could create a lot of categories (currently this also fails, the page becomes unreadable) and just load/see the given category. BTW, what I do now, is I have removed the waitpid from the script, so creating graphs are somewhat parallel (but of course not optimal of the thousand of forks).

bra-fsn avatar Feb 22 '17 20:02 bra-fsn

currently this also fails, the page becomes unreadable

It generates an ugly interface, of does it just plainly fails? As you can see, munin 3.* will display categories as tabs, which will wrap on several lines: http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/

A lot of performance improvements has been made on the 2.9.** version. It is not currently production-ready, but maybe could you try to install it on a sandbox server to check if it helps your case? We're actually quite interested in any feedback on this

quentin-st avatar Feb 22 '17 20:02 quentin-st

Hi Seems that max_graph_jobs parameter is ignored. Could you explain why please. Is there is a risk if I reactivate it?

Metabaron1 avatar Jun 12 '22 20:06 Metabaron1

Seems that max_graph_jobs parameter is ignored.

It is used only for graph_strategy cron (instead of cgi). Feel free to experiment with this value.

sumpfralle avatar Jun 12 '22 23:06 sumpfralle

Yes I'm using cron: graph_strategy cron max_graph_jobs 6 max_processes 12

I have 4 vCPUs for 100 munin nodes, but whatever the value I configure for max_graph_jobs (2, 6, 8, ...), I only see a single process running for munin-graph: /usr/bin/perl /usr/share/munin/munin-graph --cron when I see see multiple forks process for munin-update or munin-html , that's why I was wondering if munin-graph forks was disabled in code, so it's very slow: 2022/06/13 02:11:36 Starting munin-graph 2022/06/13 02:11:36 [FATAL ERROR] Lock already exists: /var/run/munin/munin-graph.lock. Dying. 2022/06/13 02:11:36 at /usr/share/perl5/Munin/Master/GraphOld.pm line 412. 2022/06/13 02:14:20 Munin-graph finished (776.00 sec) 2022/06/13 02:16:31 Starting munin-graph 2022/06/13 02:19:08 Munin-graph finished (156.97 sec) 2022/06/13 02:21:24 Starting munin-graph 2022/06/13 02:24:02 Munin-graph finished (157.45 sec) 2022/06/13 02:26:24 Starting munin-graph 2022/06/13 02:29:00 Munin-graph finished (156.33 sec) 2022/06/13 02:31:23 Starting munin-graph 2022/06/13 02:36:35 Starting munin-graph 2022/06/13 02:36:35 [FATAL ERROR] Lock already exists: /var/run/munin/munin-graph.lock. Dying. 2022/06/13 02:36:35 at /usr/share/perl5/Munin/Master/GraphOld.pm line 412. 2022/06/13 02:37:08 Munin-graph finished (344.71 sec)

munin_stats-day

Metabaron1 avatar Jun 13 '22 00:06 Metabaron1

Here is what I found in munin-graph code (Debian 11):

/usr/share/munin/munin-graph (l71): push @params, "--no-fork"; # We do not want to fork. Perf -> FastCGI

I'm not perl developper but I suppose that this load --no-fork param in first instance and completely disable any munin.conf max_graph_jobs parameter... Why the hell is it disabled???

Should I expect some side effect if I comment this line?

Metabaron1 avatar Jun 16 '22 22:06 Metabaron1

While I understand your frustration, please watch your language even so.

It looks like you're running Munin 2.0.67. I've been able to have a look at Munin 2.0.24 from git.

From the comments it looks like someone in 2012 or earlier assumed CGI generation of graphs instead of cron. You can certainly try to remove the --no-fork and see what happens. Since running without that option might have been untested for 15 years you might find that something will break.

I would comment out that line and see what happens.

niclan avatar Jun 17 '22 06:06 niclan