cylc-flow
cylc-flow copied to clipboard
Prevent task events manager logging the same message more than once
Closes #5526
Deals with a bug was returning every message ever sent by a task on each poll.
(Draft pending further checks, to ensure that I haven't broken anything)
Check List
- [x] I have read
CONTRIBUTING.mdand added my name as a Code Contributor. - [x] Contains logically grouped changes (else tidy your branch by rebase).
- [x] Does not contain off-topic changes (use other PRs for other changes).
- [x] Applied any dependency changes to both
setup.cfg(andconda-environment.ymlif present). - [x] Tests are included
- [x]
CHANGES.mdentry included if this is a change that can affect users - [x] ~Update docs~ - this is a change to Cylc's logging.
- [x] If this is a bug fix, PR should be raised against the relevant
?.?.xbranch.
@metronnie - I think that I fixed it.
The original approach (adding messages to the activity log file and checking for them later seemed to break some of the workflow restart tests in ways that make me un-confident that they are safe changes.
Scraping the log is horrible.
Next thing to try: using the database - this isn't straightforward, because the database doesn't contain a table which would easily contain the task messages.
@oliver-sanders - I've left this in a draft state - could I have your thoughts on the issue around restart?
@oliver-sanders - I've left this in a draft state - could I have your thoughts on the issue around restart?
I don't see what the issue is with restart? The intent of this PR is to prevent task messages being counted more than once. It shouldn't matter whether a restart happened in between two messages.
@oliver-sanders - I've left this in a draft state - could I have your thoughts on the issue around restart?
I don't see what the issue is with restart? The intent of this PR is to prevent task messages being counted more than once. It shouldn't matter whether a restart happened in between two messages.
No, I agree now, I just forgot to un-draft after I got it working.
Good news, messages are stored in the DB along with timestamps so we don't need to use the job-activity.log to hold this info.
Just make sure that queued database operations get processed between calls of whatever calls this code (presumably Scheduler.process_queued_task_messages) so that task messages are committed to the DB from the last batch before we process the next batch. This is probably already the case, but well worth checking.
Last thing is that the
started (polled)still piles up in the UI if you poll repeatedly
Rah!
For the tests/f/restart/58-removed-task.t test failure you're clear to bodge the test to accept the non-polled message as it isn't trying to test polling logic, it's trying to test task_pool logic.
I've noticed that the polled messages on restart do not show anymore (screenshot is from master branch). Is that desirable?
That is not desirable, however, it's a different problem - https://github.com/cylc/cylc-flow/issues/5215
The data store needs to load outputs from the database on restart/reload.
Actually I've just tested this with the UI and unfortunately no messages come through while the workflow is running, but when you stop it, duplicate messages can come through
Actually I've just tested this with the UI and unfortunately no messages come through while the workflow is running, but when you stop it, duplicate messages can come through
As it currently is, or with your changes?
As it is currently. I have been testing with this workflow
[scheduling]
[[graph]]
R1 = foo
[runtime]
[[foo]]
script = sleep 15; cylc message -- 'MEOW: Hello world'; sleep 600
- Play
- After 15 secs "Hello World" chip should show in tree view
- Poll a couple of times to see if message is repeated
- Stop
--now - Restart, message should show up again (once) as polled
- Poll again
- Stop
--kill
@MetRonnie
As it currently is.
With all the changes since we started this convo?
Not since you've rebased (btw if you could avoid rebasing it would make it easier to tell what changes you're making in between my reviews)
Also I think you've incorrectly rebased onto master?
(needs rebase onto 8.2.x to remove "bump dev version 8.3.0.dev" commit)
5. Restart, message should show up again (once) as polled
Not happening at the moment
- Restart, message should show up again (once) as polled
Not happening at the moment
I didn't think we were bothered about how the message was transmitted, so we don't want a second chip if polled.
But we do want one chip, which is not the case on restart for a ongoing running task?
Also I've noticed this PR does not stop duplicate entries piling up in the DB when polling
$ select * from task_events
foo5|1|2023-08-04T10:43:29+01:00|1|submitted|
foo5|1|2023-08-04T10:43:31+01:00|1|started|
foo5|1|2023-08-04T10:43:46+01:00|1|message meow|Hello world
foo5|1|2023-08-04T10:43:46+01:00|1|message meow|Hello world
foo5|1|2023-08-04T10:43:46+01:00|1|message meow|Hello world
But we do want one chip, which is not the case on restart for a ongoing running task?
I've lost track of the example this comment is coming from, but note that this PR will result in some message chips disappearing from the GUI on restart/realod. This isn't desired behaviour, but it is the correct logic!
The true source of the problem is here: https://github.com/cylc/cylc-flow/issues/5215, fixing this issue will restore the chip.
Ok, other than duplicate entries in task_events DB table, I think this is good to go
After discussing with Oliver we think there is still some unnecessary processing of already-received messages going on. I've opened a PR against this at https://github.com/wxtim/cylc/pull/57
Update: closed that as it does not seem feasible without a major refactor
Update: closed that as it does not seem feasible without a major refactor
Not entirely surprised, but thank you for attempting it.
@oliver-sanders are you waiting on me to do something with this?
Kinda, but I need to take a closer look. I think this isn't skipping enough of the method, the problem is about more than just logging, e.g. it appears to running event handlers and spawning logic for duplicate messages? But I know that when you tried moving the return higher up the function it caused problems.
