cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

`cylc set` bug fix for new flows

Open hjoliver opened this issue 1 year ago • 2 comments

Damn it, found a bug in 8.3.0 for cylc set --flow=new on future tasks: the task_outputs table won't be updated because of "WHERE flow is n" in the DB statement.

From: https://github.com/hjoliver/ecox-interventions?tab=readme-ov-file#scenario-2-rerun-some-tasks-from-upstream-of-a-failed-task

Example:

[task parameters]
    m = 1..3
[scheduling]
    [[graph]]
        R1 = "a => b => c<m> => d"
[runtime]
    [[root]]
        pre-script = sleep 5
    [[a, b, d]]
    [[c<m>]]
    [[c<m=2>]]
        script = """
            if (( CYLC_TASK_SUBMIT_NUMBER == 1 )); then
                false
            fi
        """

The scenario:

  • 1/c_m2 fails, stalling the workflow due to a corrupted output from 1/b
  • want to return 1/b and 1/c_m2, but not 1/c_m1, m3, then continue to 1/d and finish

Method:

  • run a new flow (2) from 1/b after setting 1/c_m1, m3 to succeeded in the new flow
  1. cylc set test //1/c_m1 //1/c_m2 --flow=2
  2. cylc trigger test//1/b --flow=2

On master this reruns 1/c_m1 and 1/c_m3 because the set outputs don't end up in the DB.

Check List

  • [ ] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [ ] Contains logically grouped changes (else tidy your branch by rebase).
  • [ ] Does not contain off-topic changes (use other PRs for other changes).
  • [ ] Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • [ ] Tests are included (or explain why tests are not needed).
  • [ ] CHANGES.md entry included if this is a change that can affect users
  • [ ] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [ ] If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

hjoliver avatar Jul 01 '24 09:07 hjoliver

Needs a rebase to pull in flake8 fixes.

wxtim avatar Jul 02 '24 11:07 wxtim

I have created a crude integration/fnctional test at https://github.com/hjoliver/cylc-flow/pull/55

I think I can do better/faster, but was blocked by a simulation mode bug.

wxtim avatar Jul 02 '24 16:07 wxtim

@wxtim - rebased and added an integration test.

My integration test is presumably similar to what you are working on? Trouble is, it passes on master too! I've run out of time to investigate further, for now.

hjoliver avatar Jul 05 '24 07:07 hjoliver

@wxtim - rebased and added an integration test.

My integration test is presumably similar to what you are working on? Trouble is, it passes on master too! I've run out of time to investigate further, for now.

Mine is a functional test using the integration test framework. Since you've written most of the test I'll leave the ball in your court I think. I have other 🐟 to 🥘 , not least a sim mode bug which will scupper skip mode too.

wxtim avatar Jul 05 '24 08:07 wxtim

(Update: a similar scenario works fine on master if the set and trigger happens for tasks that have never run before in any flow).

hjoliver avatar Jul 08 '24 00:07 hjoliver

Test tweaked and validated. It passes on master as an integration test for "the wrong reasons", which are not valid on this branch (it's to do with submit numbers being zero in integration tests - I can explain in detail if necessary). The same scenario does not pass on master as a functional test.

hjoliver avatar Jul 08 '24 06:07 hjoliver

~~Damn it, I seem to have broken two integration tests though. So not quite done.~~ DONE

hjoliver avatar Jul 08 '24 06:07 hjoliver

(All discussions resolved except for the TODO one)

hjoliver avatar Jul 09 '24 05:07 hjoliver

Code changes make sense to me.

I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.

I can confirm that test_set_future_flow passes on upstream/8.3.x so doesn't appear to be capturing this bug.

oliver-sanders avatar Jul 09 '24 11:07 oliver-sanders

This should be rebased onto 8.3.x (even if it doesn't make it into the 8.3.2 release)

MetRonnie avatar Jul 09 '24 16:07 MetRonnie

Is there a reason why this bug is on 8.3.2 milestone but open against master?

This should be rebased onto 8.3.x (even if it doesn't make it into the 8.3.2 release)

I think I probably opened this PR right after 8.3.0 was released, before the 8.3.x branch was made, and forgot to come back and rebase it. But maybe it was just a mistake. Anyhow, rebased now.

hjoliver avatar Jul 09 '24 23:07 hjoliver

I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.

I can confirm that test_set_future_flow passes on upstream/8.3.x so doesn't appear to be capturing this bug.

I commented on this above

The integration test essentially replicates the functional test that fails on master, but as an integration test it passes on master "for the wrong reasons" - to do with submit numbers being zero in integration tests.

UPDATE: I've simplified the test and it does now fail on master. Note the last commit clarifies and documents spawning logic a bit (it was too hard to follow!).

hjoliver avatar Jul 09 '24 23:07 hjoliver

Merging with missing changelog entry added!

MetRonnie avatar Jul 10 '24 10:07 MetRonnie