Mark Grondona comments

Results 535 comments of


                                            Mark Grondona

test impact of using rv1 vs rv1_nosched on instance performance

The default state of nodes in the scheduler is supposed to be down until the `resource.acquire` protocol says they are up. A bug in Fluxion (discussed above) set only the...

test impact of using rv1 vs rv1_nosched on instance performance

> To that point, I'm trying to repro some of this, just to be sure, you got a lot of job-manager/jobtap errors right @grondo? No I don't see any of...

test impact of using rv1 vs rv1_nosched on instance performance

@trws - I can't reproduce the errors above in the latest flux-sched docker container. I performed the following steps: 1. `docker pull fluxrm/flux-sched:latest` 2. `docker run -ti fluxrm/flux-sched` 3. paste...

test impact of using rv1 vs rv1_nosched on instance performance

> Hopefully that will just take care of it, will see. Ok, let me know if you still see any issues.

build: `make install` doesn't honor the install prefix in all cases

This does seem like a bug in the cmake build. However, a workaround might be to run `flux-sched` `make` under `$prefix/bin/flux start`. The Fluxion build should pick up the same...

build: `make install` doesn't honor the install prefix in all cases

`FLUX_CORE_PREFIX` _should_ be automatically set to the prefix of the first `flux` found in `PATH` (which is why I suggested running under `$prefix/flux start` or `$prefix/flux make`. This is how...

build: `make install` doesn't honor the install prefix in all cases

Thanks for that clarification! So I think the only thing not working here is that use of `--prefix` should behave the same as if you set `FLUX_CORE_PREFIX`. (Seems like the...

Preemptible jobs

Quick discussion with @garlick led to the following possible implementation: - add a new preemptible job flag (i.e. similar to `waitable` and `debug`) - a scheduler which implements job preemption...

`exit-timeout` behaviour is counter-intuitive

Agreed the terminology is a bit confusing. The term "first" here indicates the order in which the tasks exits. Since tasks start in parallel there is no "first" task in...

`exit-timeout` behaviour is counter-intuitive

Oh, I should mention that `exit-on-error` will terminate a job immediately if a task exits with a nonzero status, which isn't exactly what you were requesting.