cylc-flow
cylc-flow copied to clipboard
Implement "cylc set" command
Implement accepted proposal A New Command for Manual Task Forcing
- Close #5642
- Close #5638
- Close #5572
- Close #5882
Supersede #5412 (fix expire triggers)
- Close #5364
- Close #5354
- Close #5447
- Close #4527
TODO
- [x] Rename
set-outputs
toset
- [x] Update the CLI and GraphQL schema
- [x] Custom outputs: prohibit "all", "required", "optional" keywords, "," character, and "_cylc" prefix.
- [x] Reproduce old set-outputs behaviour with new command options and query
- [x] Implement the new logic for setting prerequisites [FIRST CUT done]
- [x] Implement the new logic for setting outputs
- [x] Test new schema works as expected in the UI
- [x] Tests for all new functionality
- [x] Update documentation -> Punted to https://github.com/cylc/cylc-doc/issues/688
- [x] This is a breaking change, add an entry into the cylc-doc changes page -> Punted to https://github.com/cylc/cylc-doc/issues/688
Side effects:
- rename
cylc set-verbosity
tocylc verbosity
to avoid command ambiguity apocalypse - command UUIDs
- cleaner log message prefix for tasks
Follow-up:
-
cylc set --pre=cycle
PR Check List
- [x] I have read
CONTRIBUTING.md
and added my name as a Code Contributor. - [x] Contains logically grouped changes (else tidy your branch by rebase).
- [x] Does not contain off-topic changes (use other PRs for other changes).
- [x] Applied any dependency changes to both
setup.cfg
(andconda-environment.yml
if present). - [x] Tests are included (or explain why tests are not needed).
- [x]
CHANGES.md
entry included if this is a change that can affect users
Rebased. Tests fixed. Will finish this off ASAP.
Unrelated (?) mypy error from the Py 3.10 and 3.11 fast tests:
/opt/hostedtoolcache/Python/3.11.5/x64/lib/python3.11/site-packages/matplotlib/pyplot.py:2700:
error: Positional-only parameters are only supported in Python 3.8 and greater [syntax]
Found 1 error in 1 file (errors prevented further checking)
[update: fixed by #5735]
@oliver-sanders - I just took a quick look at your "Extension" part of the proposal:
When setting prereqs/outputs it would be useful to provide visual confirmation of the result. We could display the cylc show output (which can be generated Scheduler side without the requirement for a second API call).
Correct me if I'm wrong, but to do this we need to take the set
call out of the asynchronous command queue in the scheduler, and instead call it directly via a wrapper method in the resolver.
The trouble is, the set
call updates task DB tables, and I get this:
self.conn.executemany(stmt, stmt_args_list)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.
@oliver-sanders - this now incorporates (and supersedes) #5412 because I needed to handle (and test consequences of, such as triggering) set to expired. So there will be a bit of overlap and conflict with #5651, but hopefully not too much.
@oliver-sanders - I think there's a small error in the proposal example 5
$ cylc set //foo --out=a,succeeded --wait
The database will record:
foo: state=waiting, outputs={a}
The DB should record state=succeeded
and outputs={a, submitted, started, succeeded}
because:
- completed outputs affect task status
- we set succeeded, and it has implied outputs
Agree?
The DB should record state=succeeded and outputs={a, submitted, started, succeeded} because:
- completed outputs affect task status
- we set succeeded, and it has implied outputs
Agreed, the example said --out=a,succeeded
so the missing submitted, started, succeeded were definitely an omission.
Correct me if I'm wrong, but to do this we need to take the set call out of the asynchronous command queue in the scheduler, and instead call it directly via a wrapper method in the resolver.
The trouble is, the set call updates task DB tables, and I get this:
I said it would be nice, I didn't say it would be easy! Outside of brain cache at this point :(
The resolvers can see the store, and the store should have all the required prerequisite info in it, so we should be able to read from the store rather than the DB for the cylc show
output. So the remaining issue is about knowing when the "set" operation has completed which is the same issue as #3329.
Undrafting this now as I think it implements the proposal fully.
I've added functional tests that reproduce the examples in the proposal (IMO functional tests are needed for this sort of thing, although I'll consider other tests too).
From a quick look I'd say the patch coverage seems to be wrong, maybe because the remote tests are failing (unrelated).
Still to do:
- more tidying?
- more tests?
- test via the UI
- docs
From a quick look I'd say the patch coverage seems to be wrong, maybe because the remote tests are failing (unrelated).
Caused by an external issue, now fixed in the upstream repo.
IMO functional tests are needed for this sort of thing, although I'll consider other tests too
IMO functional tests are absolutely necessary to ensure the end-to-end functionality and we definitely need some. However, I don't think we should attempt to test every possible option in this way, only examples of the general functionality, because functional tests are hard to maintain, slow to run and difficult to debug.
Note, the integration testing framework runs the scheduler, the whole scheduler and nothing but the scheduler, the only thing it's really cutting out is the CLI. You can do everything, up to and including reference tests with it if so inclined:
async with run(schd) as log:
# run the workflow with a timeout of 60 seconds
await asyncio.sleep(60)
assert reftest(log) == '''
1/b triggered off [1/a]
1/c triggered off [1/b]
'''
Although that's really a just functional test implemented in Python.
For a more integration'ey approach to reftests we can do something like this which is essentially just another way of getting the "triggered off" information without having to run the main loop and bring race conditions into play.
async with start(schd):
assert set(schd.pool.get_tasks()) == {'1/a'}
# setting a:succeeded should spawn b
schd.command_reset('1/a', 'succeeded')
assert set(schd.pool.get_tasks()) == {'1/b'}
# setting b:x should spawn c
schd.command_reset('1/b', 'x')
assert set(schd.pool.get_tasks()) == {'1/b', '1/c'}
(Almost there...)
CLI auto-completion for the --pre
and --out
arguments: https://github.com/hjoliver/cylc-flow/pull/34
Can add set --out=succeeded
an an option to cylc tui
after the current PR has been merged.
(The only small hold-up here is: I removed the functional tests, and haven't finished replacing them with integration tests ... probably should have done that as a follow-up! - but during that process made some other tidy-up changes that broke a few other tests ... almost done).
It might be worth inferring :succeeded to match graph behaviour?
Yes, done.
Still not done adding tests. This afternoon I got stuck on the failing event-mail functional tests. They are failing for me locally too, but possibly for different reasons ... can someone with 5 mins spare run them locally and compare master with this branch?
(And now I've broken one integration test, but it's a trivial fix...)
Sorry, forgot to mention, the mail event tests got broken on master (mb, I merged something before it was ready), Tim's working on a fix.
This illogical command had a curious outcome:
cylc set --out=succeeded,failed
That's not illogical. A single running job can't both succeed and fail, but a task can fail and then succeed on retry, in which case the DB task_outputs
table records both outputs.
For cylc set
, maybe we want to test both success and fail branches at once...
So it seems to me we should leave that as-is.
@oliver-sanders - I think I've address all your comments, and all the tests (except the usual MacOS ones) are passing now. I just need to add more integration tests tomorrow (having removed my new functional tests, at least for the moment; if I run out of time tomorrow I'll put those back in and do a follow-up PR for the integration tests, so we can get moving on this ... I'm going on leave for a week, next week)
That's not illogical. A single running job can't both succeed and fail
Fair enough, my confusion was more that running cylc set --out=succeeded,failed
resulted in the task being left in the succeeded state irrespective of what order you specify succeeded/failed in.
I've tested it with a branched workflow and confirmed that both pathways are being run, what state the task goes into is an implementation choice, I guess the reason for this is probably the sort order of succeeded
/failed
?
Pretty close, works nicely for most cases. Going through the use cases in the proposal I've spotted a few issues...
1. I don't think "set" is working with globs?
Glob requirement from use case (3).
[scheduling]
[[graph]]
R1 = a
[runtime]
[[a]]
script = false
$ cylc dump baz -t
a, 1, failed, not-held, not-queued, not-runahead
$ cylc set 'baz//*' --out=succeeded
Command submitted; the scheduler will log any problems.
$ cylc dump baz -t
a, 1, failed, not-held, not-queued, not-runahead
$ cylc cat baz | tail -n 7
2023-11-16T11:22:07Z WARNING - No matching tasks found: */* (Incompatible value for <class 'cylc.flow.cycling.integer.IntegerPoint'>: *: invalid literal for int() with base 10: '*')
2023-11-16T11:22:07Z INFO - Command "set" actioned: id=82cd8248-e235-4f8d-8cd9-528e9d72d257
2023-11-16T11:22:07Z WARNING - stall timer stopped
2023-11-16T11:22:07Z ERROR - Incomplete tasks:
* 1/a did not complete required outputs: ['succeeded']
2023-11-16T11:22:07Z CRITICAL - Workflow stalled
2023-11-16T11:22:07Z WARNING - PT1H stall timer starts NOW
Note the error:
Incompatible value for <class 'cylc.flow.cycling.iso8601.ISO8601Point'>
2. Workflow can prematurely shut down with --pre=succeeded
.
From use case (5)
This workflow will run indefinitely:
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 1
cycling mode = integer
runahead limit = P0
[[graph]]
P1 = """
a:x? => x
a:y? => y
x | y => z
"""
[runtime]
[[a]]
script = cylc message -- x
[[[outputs]]]
x = x
y = y
But jam this in and it will shut down after running 2/z
because the spawning of the parentless task a
is interrupted:
[[z]]
script = """
if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then
cylc set "${CYLC_WORKFLOW_ID}//2/a" --out=y,succeeded
fi
"""
3. Incomplete tasks are not run.
Starting with the same example as (2) above but jamming in this:
[[z]]
script = """
if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then
cylc set "${CYLC_WORKFLOW_ID}//2/a" --out=y
fi
"""
We get really strange behaviour.
The cylc set
command, sets the output y
, but not succeeded
or failed
so the task is both not-finished and not-complete. Because the task is incomplete, the approaching flow should run it. And according to the log, Tui, dump, etc it does:
INFO - [2/a running(runahead) job:00 flows:1] => running
But in reality it doesn't:
$ ls ~/cylc-run/future/log/job/2
y z
And the workflow hangs with a non-existent running task?!
4. Expired task state.
A couple of minor issues with expired tasks.
[scheduler]
allow implicit tasks = True
cycle point format = CCYY
[scheduling]
initial cycle point = 2020
[[special tasks]]
clock-expire = a
[[graph]]
P1Y = """
z[-P1Y] => a
a:expire? => e
a:succeed => s
e | s => z
"""
The task a
expires correctly and the workflow runs as expected (the important part).
However, the expired task retains the queued state (logs, tui, dump, etc) and is logged as incomplete:
2023-11-16T13:28:41Z WARNING - [2023/a expired(queued) job:00 flows:1] did not complete required outputs: ['submitted', 'succeeded']
I think we should set is_queued=False
with itask.state_reset(TASK_STATUS_EXPIRED)
and skip the incomplete logging.
5. Workflow hangs after spawning parentless task.
From use case (7)
[scheduler]
allow implicit tasks = True
cycle point format = CCYY
[scheduling]
initial cycle point = now
[[graph]]
P1Y = @wall_clock => a => b
[runtime]
[[a]]
script = """
# skip to cycle 2030 by stopping the flow here
cylc stop --flow=1 "${CYLC_WORKFLOW_ID}"
# then spawning the parentless task a in cycle 2030 (use case 7)
cylc set --pre=all "${CYLC_WORKFLOW_ID}//2030/a"
"""
This example almost works as expected:
INFO - [2023/a running job:01 flows:none] => succeeded
INFO - [2030/a waiting(runahead) job:00 flows:1] => waiting
INFO - [2031/a waiting(runahead) job:00 flows:1] => waiting
INFO - [2032/a waiting(runahead) job:00 flows:1] => waiting
INFO - [2033/a waiting(runahead) job:00 flows:1] => waiting
INFO - [2034/a waiting(runahead) job:00 flows:1] => waiting
But strangely, none of those waiting tasks ever get run and the workflow hangs?!
The runahead limit appears to get updated correctly, not sure what the cause is?
6. Workflow stops prematurely when spawning parentless task.
Same example as before, but running the commands manually rather than automatically in [runtime][a]script
.
$ cylc pause orphan
$ cylc stop --flow=1 orphan
$ cylc set --pre=all orphan//2030/a
$ cylc play orphan//
INFO - Command "set" received: id=c443704e-c534-4421-be23-155441823580
set(flow=['all'], flow_wait=False, outputs=[], prerequisites=['all'], tasks=['2030/a'])
INFO - Command "set" actioned: id=c443704e-c534-4421-be23-155441823580
INFO - Command "resume" received: id=a44a9163-c0fb-4f87-b590-f65efc435639
INFO - RESUMING the workflow now
INFO - Command "resume" actioned: id=a44a9163-c0fb-4f87-b590-f65efc435639
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE
I've been testing with the Tui branch merged in, this patch is handy:
diff --git a/cylc/flow/tui/data.py b/cylc/flow/tui/data.py
index 04c3d7220..0feb6b8ae 100644
--- a/cylc/flow/tui/data.py
+++ b/cylc/flow/tui/data.py
@@ -107,6 +107,7 @@ MUTATIONS = {
'task': [
'hold',
'release',
+ 'set',
'kill',
'trigger',
'poll',
- I don't think "set" is working with globs?
It works with task name globs (cylc set --out=succeeded "baz//1/*"
); I guess I hadn't tested with //*
.
[UPDATE]: actually I had forgotten, but this was deliberate - because (a) "set" has to be able to target any task, and (b) we have open task globbing issues still under discussion, It isn't trivial to match both in and out of the task pool within the same command invocation when the command affects pool content (set-outputs can spawn new tasks into the pool).
So currently on this branch you can't use cycle point globs, but that use case works fine with specific cycle points. (Mind you, cycle point globbing is considerably less important than task name globbing anyway). Matching by state selector also remains a TODO on this branch, since that's task-pool specific.
On the weekend, before going away Monday, I'll do my best get this branch ready and explain exactly what needs doing as a follow-up.
Although, nice work on the stress-testing, looks like I have my work cut out for me! Unfortunately I blew my time today looking at the globbing issues.
[UPDATE]: actually I had forgotten, but this was deliberate
So that means we can't cylc set '*' --out=succeeded
?
Can we stick to task pool globbing for now until we've flushed out a way forward, having different rules for different commands is too confusing.
[UPDATE]: actually I had forgotten, but this was deliberate
So that means we can't
cylc set '*' --out=succeeded
?
Yes, but we can do cylc set
1/* --out=succeeded
, which is pretty easy.
Can we stick to task pool globbing for now until we've flushed out a way forward, having different rules for different commands is too confusing.
OK, I'll revert for now. [Actually, not "revert" - see next comment below]. But note:
- the fact that globbing currently only works in the pool already amounts to a form of "different rules for different commands" and has already caused problems for users
- different commands necessarily act on different populations of tasks, which has implications for the underlying matching rules (although some of those differences can be hidden from users)
So we really need to sort this out pretty quickly. I'll bang up a new "task selection" issue to collate and supersede the current partial ones... #5827
OK, I'll revert for now.
Actually set-outputs on master does not do pool-only globbing. It globs universally on task names, but doesn't allow cycle point globs. So I had not actually changed it on this branch.
I will change it to do that now, however, pending resolution of #5827
(Reverting to DRAFT until I've addressed all review points - as several are non-trivial)
New side-effect noted in the description: - cleaner log message prefix for tasks.
Old:
[1/foo submitted job:01 flows:1] => running
[1/bar submitted job:03 flows:1,2] => running
New:
[1/foo/01:submitted] => running
[1/foo/01(1,2):submitted] => running
I should probably have flagged this first, but:
- the job submit number is now technically part of the ID so that part should not be controversial
- @oliver-sanders already expressed a preference some time ago for ommiting flow 1, so that users who don't use flows don't need to think about it
- changing how the flow numbers are displayed is trivial (albeit with a bunch of knock-on log-grep affects in the tests)
After extensive changes and rebasing onto master, I've got the tests passing again, except for tests/integration/tui/test_app.py::test_auto_expansion
which I think might indicate a missed datastore update on this branch.
Of the extra stuff, from @oliver-sanders above:
- [x] 1. I don't think "set" is working with globs?
- [x] 2. Workflow can prematurely shut down with --pre=succeeded
- [x] 3. Incomplete tasks are not run.
- [x] 4. Expired task state.
- [x] ~~5. Workflow hangs after spawning parentless task.~~ [INVALID - see below]
- [x] 6. Workflow stops prematurely when spawning parentless task
- Workflow hangs after spawning parentless task ... But strangely, none of those waiting tasks ever get run and the workflow hangs?! The runahead limit appears to get updated correctly, not sure what the cause is?
User error - the tasks are spawned way beyond their wall_clock triggers!
Current status of this branch: I believe it implements the cylc set
proposal, and it addresses all the issues raised in review by Oliver.
If so, and if no further problems arise in testing, then the only remaining problem is that a bunch of new functional tests should be converted to maximally simply "canonical" example functional tests, with coverage achieved independently (of the canonical tests) by unit and integration tests.
I could attempt to do that on this PR branch, or - given the urgency of getting 8.3.0 done - I could do the test changes as a follow-up. What's your preference @oliver-sanders ?
Damn, what a mission. @oliver-sanders - all comments addressed and tests passing now.
I still need to take a closer look at coverage and shift some tests out of functional, but realistically that might take a bit of work. How do you feel about me doing that as a follow-up, so you can start building your part on top of this?