cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

flow-nums data-store fix

Open dwsutherland opened this issue 1 year ago • 14 comments

closes #6114

Using example from above produces: image

The solution was to create a delta from the new task proxy (instead of trying to overwrite the existing).

existing tests should cover code (?)

Check List

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • [ ] Tests are included (or explain why tests are not needed).
  • [ ] CHANGES.md entry included if this is a change that can affect users
  • [x] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [ ] If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

dwsutherland avatar May 30 '24 10:05 dwsutherland

Hmm, I can't replicate tests/f/cylc-show/06-past-present-future.t failure locally

MetRonnie avatar May 30 '24 11:05 MetRonnie

(Bumping to 8.3.1; if it gets merged before 8.3.0 will change back)

MetRonnie avatar May 30 '24 11:05 MetRonnie

Hmm, I can't replicate tests/f/cylc-show/06-past-present-future.t failure locally

I can unfortunately

dwsutherland avatar May 30 '24 12:05 dwsutherland

It seems to be marking a succeeded task as failed: https://github.com/cylc/cylc-flow/actions/runs/9301382754/job/25601035801?pr=6115#step:17:3293

Reporting on the task status from the wrong flow?

oliver-sanders avatar May 30 '24 13:05 oliver-sanders

It seems to be marking a succeeded task as failed: https://github.com/cylc/cylc-flow/actions/runs/9301382754/job/25601035801?pr=6115#step:17:3293

Reporting on the task status from the wrong flow?

I think a delta is being overwritten by another delta... maybe

dwsutherland avatar May 30 '24 21:05 dwsutherland

Ah, I think I know what's happening..

[scheduler]
   allow implicit tasks = True
   [[events]]
      inactivity timeout = PT1M
      abort on inactivity timeout = True
[scheduling]
   [[graph]]
      R1 = """
         a => b => c? => d
         c:fail? => kick-c
      """
[runtime]
   # First run: c stops the scheduler then fails.
   # On restart, kick-c retriggers c to run 'cylc show'.
   [[kick-c]]
      script = cylc trigger "$CYLC_WORKFLOW_ID//1/c"
   [[c]]
      script = """
         if ((CYLC_TASK_SUBMIT_NUMBER == 1)); then
            cylc stop --now --max-polls=10 --interval=1 $CYLC_WORKFLOW_ID
            false
         else
            # Allow time for c submission => running
            sleep 2
            cylc show "$CYLC_WORKFLOW_ID//1/b" >> $CYLC_WORKFLOW_RUN_DIR/show-b.txt
            cylc show "$CYLC_WORKFLOW_ID//1/c" >> $CYLC_WORKFLOW_RUN_DIR/show-c.txt
            cylc show "$CYLC_WORKFLOW_ID//1/d" >> $CYLC_WORKFLOW_RUN_DIR/show-d.txt
         fi
      """

When you start the workflow up again, the states and prerequisites of tasks are loaded from the DB .. including ✓ 1/b succeeded in 1/c (as in the test).

However, when you trigger a task it's prerequisites aren't loaded.. So this change is correctly setting the prerequisites to the newly triggered 1/c (⨯ 1/b succeeded)... This is how Cylc works at the moment..

So I can either:

  1. change the test to reflect the state of the new 1/c
  2. fake it to keep the old 1/c's prerequisites..
  3. have forced-triggers/triggers actually set the prerequisites "correctly" for the task pool object (however that can be done).

Ideally maybe option 3 (?)

However, we shouldn't do option 2, as I can imagine cases where the new object does have correct prerequisites (i.e. from a reflow and/or after reload )..

So I'm going to change the test..

dwsutherland avatar May 30 '24 22:05 dwsutherland

Why is the test passing for me locally though? (on https://github.com/cylc/cylc-flow/pull/6115/commits/e11df86befcb5c7a192e7dc03c8b5e8fd5c4ebde)

MetRonnie avatar May 31 '24 10:05 MetRonnie

Why is the test passing for me locally though? (on e11df86)

Not sure why, but tests are passing now, and it passed locally

dwsutherland avatar Jun 09 '24 03:06 dwsutherland

So I can either:

I would go for the minimal change to fix the issue and kick wider Cylc implementation stuff to an issue.

@MetRonnie are you happy with the amended test?

oliver-sanders avatar Jun 27 '24 08:06 oliver-sanders

Tried this out, seems to do the job.

Wrote an integration test which fails before and passes after this change: https://github.com/dwsutherland/cylc-flow/pull/20

Discovered an unrelated bug in the process: https://github.com/cylc/cylc-flow/issues/6174

oliver-sanders avatar Jun 27 '24 10:06 oliver-sanders

(Figured out that the tests were passing locally for me because the comm command can't tell the difference between and when LANG=en_US.UTF-8 for some bizarre reason)

MetRonnie avatar Jun 27 '24 16:06 MetRonnie

(Figured out that the tests were passing locally for me because the comm command can't tell the difference between ✓ and ⨯ when LANG=en_US.UTF-8 for some bizarre reason)

Sounds like it would make sense to set LANG=C for the purpose of this comparison?

LANG=C contains_ok ...

(sry, hit wrong button)

oliver-sanders avatar Jun 28 '24 12:06 oliver-sanders

Sounds like it would make sense to set LANG=C for the purpose of this comparison?

#6183

MetRonnie avatar Jun 28 '24 12:06 MetRonnie

(that fix is now in, rebase this PR to pick it up)

oliver-sanders avatar Jul 01 '24 15:07 oliver-sanders

(@dwsutherland, one unresolved review comment above)

oliver-sanders avatar Jul 03 '24 10:07 oliver-sanders

(@dwsutherland, one unresolved review comment above)

Replied.. hopefully this convinces you @MetRonnie

dwsutherland avatar Jul 05 '24 06:07 dwsutherland

(Note, no changelog entry required, this does not impact users at present)

oliver-sanders avatar Jul 09 '24 09:07 oliver-sanders