Mark Grondona

Results 740 comments of Mark Grondona

Nice debugging, thanks! Anything we can do in the testsuite to make this easier to diagnose?

The second two comments were great advice (and I should have seen this earlier, sorry!). Those fixes should be simple, thanks! For the issue with Lua, are you saying that...

Ok, we may just have to add _a lot_ more debug to that particular test. This set of tests is indeed an unusual case, in that the tests themselves are...

> I don't know if it is relevant but I did find a single remaining case of using a hardcoded path to lua: Good catch, but I think that file...

Also for that case you may want the system instance broker pinned to the OS cores, but it should use the topology of the configured cores. We may need to...

Until a fix for this issue merged and propagated to affected systems, it may be useful to install an rc1 task to do the equivalent, i.e. check that the count...

While working on this I discovered that the resource module only drains nodes with missing resources, not extra resources: https://github.com/flux-framework/flux-core/blob/c31fb47f817e9892c5be0b36100f1e7fb46cdbd4/src/modules/resource/topo.c#L112-L120 So, removing the exclusion of GPUs would not fix the...

After implementing the `noverify = ["core"]` support, I realized this approach does not cleanly handle the case for falling verification when more resources than configured are discovered. Instead, maybe something...

Yeah, I was thinking along the same lines. However, we'd still need to cache the previous hwloc XML somewhere so that the cached version doesn't have the extra/incorrect resources.

It turns out this is a potential issue even now: Recently, a node was run out of memory and the resulting slowness caused the broker heartbeat timeout, so the broker...