Mark Grondona

Results 515 comments of Mark Grondona

> Rabbit systems in general will, but at the moment Hetchy doesn't. Fluxion doesn't know anything about the rabbits. So that isn't the culprit. Ah, thanks for that information. If...

A good test may be to try reloading `sched-fluxion-qmanager` and `sched-fluxion-resource` to see if the problem goes away. However, we may want to collect as much information from the affected...

I started a test instance with the same R as configured on hetchy and could not reproduce the issue, so the cause here isn't the specific configuration of resources. Not...

This problem is reproducible by collecting some of the config from hetchy: Note: I've added `resource.noverify = true` here. ```toml [job-manager] plugins = [ { load = "perilog.so" }, {...

Getting a similar result from `perf`: ``` - 99.93% 0.00% flux-broker-0 [.] ev_run ▒ ev_run ▒ - ev_run ▒ - 99.93% ev_invoke_pending ▒ - 99.68% handle_cb ▒ - 99.60% dispatch_message...

FYI - I didn't get different results running perf with `perf record -g --call-graph=dwarf` as noted [here](https://rwmj.wordpress.com/2023/02/06/frame-pointers-an-important-update/) to make sure we're getting valid backtraces. (I'm pretty sure I did that...

Here's a script that acts as a reproducer run out of a top-level flux-sched builddir: ```sh #!/bin/sh flux module remove sched-fluxion-qmanager flux module remove sched-fluxion-resource flux module remove resource flux...

I think this issue still applies since Hetchy (and all our systems) are using node exclusive policy? We're still at 10-20x slowdown. It might be nice to keep all the...

Note as shown by results in #1009, this performance issue also occurs with or without node exclusive scheduling when moderate amounts of resources are involved in scheduling (in the examples,...

Thank you @trws! That advice is very helpful!