cylc-flow Spurious "Not respawning... task was removed" log messages

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 2024-01-01
    [[graph]]
        PT3H = get_observations => consolidate_observations

        +PT6H/PT6H = """
            consolidate_observations => forecast
            consolidate_observations[-PT3H] => forecast
            consolidate_observations[-PT6H] => forecast
            get_rainfall => forecast => post_process
        """

        +PT12H/PT6H = """
            forecast[-PT6H] => forecast
        """

$ cylc play simple-datetime --pause
...
$ cylc stop simple-datetime
...
$ cylc play simple-datetime --pause --no-detach
...
INFO - LOADING task proxies
INFO - + 20240101T0000+0100/get_observations waiting
INFO - [20240101T0000+0100/get_observations:waiting(runahead)] => waiting
INFO - [20240101T0000+0100/get_observations:waiting] => waiting(queued)
INFO - Not respawning 20240101T0300+0100/get_observations - task was removed
INFO - Not respawning 20240101T0600+0100/get_observations - task was removed
INFO - Not respawning 20240101T0900+0100/get_observations - task was removed
INFO - Not respawning 20240101T1200+0100/get_observations - task was removed
INFO - Not respawning 20240101T1500+0100/get_observations - task was removed
INFO - + 20240101T0300+0100/get_observations waiting
INFO - [20240101T0300+0100/get_observations:waiting(runahead)] => waiting
INFO - [20240101T0300+0100/get_observations:waiting] => waiting(queued)
...

Oct 14 '25 16:10 MetRonnie

This log message went from DEBUG to INFO in the group-trigger PR.

https://github.com/cylc/cylc-flow/pull/6395#discussion_r2108816220

I think the issue was punted to https://github.com/cylc/cylc-flow/issues/6813, but the log message wasn't changed by that PR.

I don't think the log message makes sense, especially in this example where remove is not even involved. Suggest removing it.

However, this demonstrates that the code comments are bunk:

            # If itask has any history in this flow but no completed outputs
            # we can infer it has just been deliberately removed (N.B. not
            # by `cylc remove`), so don't immediately respawn it.
            # TODO (follow-up work):
            # - this logic fails if task removed after some outputs completed
            LOG.info(f"Not respawning {point}/{name} - task was removed")

Which is worrying!

Ping @hjoliver

Oct 15 '25 14:10 oliver-sanders

[UPDATE] I've understood the problem. It is inconsequential apart from these spurious log messages for non-sequential parentless waiting tasks at restart, due to trying to auto-spawn each one out to the runahead limit again - that was already done prior to shutdown.

The logic is correct for the case described in the code comment, but neglects to account for the situation in this example.

Correct:

try to spawn a new task proxy
that task has a recorded history in the DB but no completed outputs
therefore it must have been previously spawned but "removed" before it could run
it must have been removed by suicide trigger via another dependency
- (not by cylc remove which would have erased the prior status)
therefore we should not respawn it

Neglected:

~restart after shutting down with waiting tasks requires spawning waiting tasks like this~
restart after shutting down with auto-spawned parentless tasks triggers this logic, because each time we resurrect one of these tasks we try to auto-spawn it again out to the runahead limit
- so this code block is possibly preventing a bug (re-adding the same task to the pool multiple times, not sure if that would have bad effects or not) although the log message is misleading
- it does not cause needed tasks to be omitted, because it only comes from attempting to auto-spawn parentless tasks that are going to be resurrected (at restart) from the DB anyway

Simpler example:

[scheduler]
    allow implicit tasks = True
[scheduling]
    cycling mode = integer
    runahead limit = P0
    [[graph]]
        P1 = foo

cylc vip --pause --no-detach - task pool:

1/foo:waiting (queued)
2/foo:waiting (runahead)

cylc stop - DB:

1/foo:waiting
2/foo:waiting

`cylc play --pause --no-detach

(a) resurrect 1/foo from DB, try to autospawn 2/foo (refuse to do that, log the message)
(b) resurrect '2/foo` from DB - good to go

With the default 5 cycle runhead limit we get an ever decreasing series of these log messages for each resurrected non-sequential waiting parentless task:

resurrect 1, log 2, 3, 4, 5, 6 (from trying to auto-spawn 2,...,6)
resurrect 2, log 3, 4, 5, 6
resurrect 3, log 4, 5, 6
resurrect 4, log 5, 6
resurrect 5, log 6
resurrect 6 - good to go

Oct 17 '25 02:10 hjoliver

I'll get a fix up tomorrow.

Oct 19 '25 23:10 hjoliver