task icon indicating copy to clipboard operation
task copied to clipboard

Fix bug for running tasks twice

Open libre-man opened this issue 1 year ago • 6 comments

This fixes #1498 by keeping track of the namespaces of a task in an array, instead of just the last one.

libre-man avatar Sep 12 '24 14:09 libre-man

I'm not entire sure what the test failure is here, can someone shed some light on that? Locally all tests pass.

It seems like a race condition, but running even master with the race detector on gives many errors.

libre-man avatar Sep 12 '24 14:09 libre-man

It's not clear to me what is being fixed here since #1498 is already resolved. Could you please provide an example that reproduces your specific issue. Please also read the discussion in #1655 as a decision was made to not change some behaviour on purpose.

pd93 avatar Sep 13 '24 14:09 pd93

I created this PR before that ticket was closed. The issue here is when we have a structure like this

FileA |
      Includes -> FileB
                | Includes -> FileC
                            | -> Has Task A
      Includes -> FileC
                | -> Has Task A

If now we call Task A through FileA -> FileB -> FileC and through FileA -> FileC we get Task A to run multiple times. This is especially and issue when Task A is not idempotent or not safe to run concurrently. A prime example of this is installing something with npm ci.

libre-man avatar Sep 13 '24 17:09 libre-man

Here is a reproduction of your issue as I understand it:

# Taskfile.yml
version: "3"

includes:
  b: ./b.yml
  c: ./c.yml

tasks:
  default:
    deps:
      - abc
      - ac

  abc:
    cmds:
      - task: b:c:task-a

  ac:
    cmds:
      - task: c:task-a
# b.yml
version: "3"

includes:
  c: ./c.yml
# c.yml
version: "3"

tasks:
  task-a:
    run: once
    sources:
      - ./input
    generates:
      - ./output
    cmds:
      - touch output
❯ task -d ./tmp/1804 abc
task: [b:c:task-a] touch output

❯ task -d ./tmp/1804 ac
task: [c:task-a] touch output          # Runs task again

This is currently working as intended (See discussion in #1655), although we have left the door open to discussion. We made the decision that we would not prevent these tasks from running since they have different calling paths. It is possible that the tasks can contain different data etc and it is safer to run a task twice than risk it not running when it should have.

This needs more discussion before it can be merged, but I have left a couple of comments. @vmaerten, @andreynering, do you have any additional thoughts beyond what has been previously discussed? Note that this implementation appears to follow option 3 as described in #1655 i.e. Whether the task is run or not is only affected when run: once is enabled. The behavior does not appear to change if this is omited.

@libre-man please feel free to speak up if you disagree with any of the comments made previously. Also the PR could probably do with a new test that replicates this issue to ensure that we don't get regressions in the future it this gets merged.

pd93 avatar Sep 14 '24 00:09 pd93

The main issue is that run: once as I see it is used to say "I only want this test to be run once", which it currently doesn't do. I would even say that for run: when_changed the current behaviour of saying that different calling paths are different tasks don't make a lot of sense, however I might be wrong of course. The task, in my opinion, should care where it is called from, only which arguments it gets.

My use case is this: I have a file Taskfile.setup.yml that is imported by many other taskfiles. It is responsible for setting up stuff like Python, pnpm, smithy. Making all those scripts safe to be run concurrently is not really feasible, I would need to use file system locks and it would get way too complicated very quickly. With the current implementation there is no way around this, and it kinda breaks my encapsulation that every task is responsible for setting up their own dependencies.

If the change gets the OK I'll polish the PR with tests too, I'm waiting to do that until the OK so I don't waste any time on that if it won't get merged anyway.

One thing about your reproduction: it also runs the task twice if you would do task --parallel -d ./tmp/1804 abc ac.

libre-man avatar Sep 14 '24 09:09 libre-man

Just to add to the discussion, I'm running into what sounds like a similar (the same?) situation where some taskfiles get included by multiple other taskfiles, and despite them having run: once, the same task (pkg_a:build) runs multiple times in parallel:

pkg_b:pkg_a:build pkg_c:pkg_b:pkg_a:build pkg_d:pkg_c:pkg_b:pkg_a:build

The only workaround I've been able to think of is renaming the task pkg_a_build and then including the pkg_a taskfile with flatten: true. This causes the task to be properly deduplicated across the chained includes and then it runs only once.

erictooth avatar Oct 10 '24 16:10 erictooth

Thanks for your PR, this has been fixed

vmaerten avatar Nov 23 '25 11:11 vmaerten