task icon indicating copy to clipboard operation
task copied to clipboard

feat: check and cache the task definition, variables, status

Open aminya opened this issue 1 year ago • 9 comments

This adds a new checker for Task definitions, status, and any other variable that affect a task.

The implementation hashes the task structure and writes it to .task/definition. In later runs, this is checked to make sure the definition is up-to-date. If the definition has changed (e.g. one of the variables has changed), the task will run again.

The approach is configurable through check_definition. Now, by default ("auto"), the definition is considered when the sources are defined and not the status checks. If desirable, check_definition can be set to "always" to always consider the definition of the task. It can be also disabled via "never" altogether regardless of other conditions.

Fixes #548 Fixes #455

aminya avatar Nov 15 '23 10:11 aminya

@andreynering Would you take a look at this pull request? This fixes the Task false-cache hit issues when the variables change. Without this, we cannot use Task caching altogether as it incorrectly skips running some of the tasks.

aminya avatar Dec 05 '23 19:12 aminya

Hi @aminya!

First of all, thanks for your contribution!

After a quick look, the code itself seems to go in the right direction.

As said in comments like https://github.com/go-task/task/issues/548#issuecomment-941679515, https://github.com/go-task/task/issues/455#issuecomment-890527159 and https://github.com/go-task/task/issues/455#issuecomment-1604455703, enabling this could add some unintended side-effects. Also, it would be a breaking change (although a relatively small one).

In other words: this needs a bit more thought. I think this could maybe be enabled via a global setting in the Taskfile.

/cc @pd93 (just in case you have any thoughts)

andreynering avatar Dec 13 '23 16:12 andreynering

Hey both, just adding some ~quick~ thoughts while I have a bit of time:

  1. I agree with @andreynering regarding the code. I like the general direction it's going in and that it follows the same style as the other fingerprinting checker interfaces that we refactored earlier this year.

  2. I still stand by my previous comments in #455.

More specifically, it appears that you went with the second option that I outlined which I think has some flaws. Mainly that I don't think changing a Task's desc or summary should invalid its cache. We should probably only stick to properties that affect a task's outcome. i.e. cmds and deps.

Additionally, I'm not convinced that enabling this new behaviour by default is the correct decision. Sometimes running a task can be destructive and this is why we write status checks - to ensure that a task will not run depending on the state of the system. Overwriting this behaviour when a task's definition changes will absolutely trip someone up at some point. Especially given that a task's command could change entirely and still have the same outcome as before.

One way or another, we have to accept one of two fates. Either:

  1. A task will occasionally not run when a user actually wanted it to (safe, current behaviour)
  2. A task will occasionally run when a user did not want it to (unsafe)

We have a responsibility to ensure that Task users have confidence in the outcome of their commands when they run them and I think the second option goes against this. Particularly when you consider that the first problem can be quickly solved by intuitively adding the --force flag to your command.

If there is a desire from the community/other maintainers to continue with this change then I would like to entertain some ways that we can make it clearer to the user that a task's cache is invalidated when the definition changes. A couple of ideas that I haven't given much thought:

  • A global/task-level key to enable the behaviour, alongside a warning/disclaimer in our documentation about the dangers of enabling it.
  • Showing a prompt before the task runs saying that it is only running because the definition has changed and not because of the status check and asking the user if they wish to continue.

pd93 avatar Dec 17 '23 04:12 pd93

@pd93 I think instead of making things complicated and keeping the current buggy behaviour, we can do this:

  • When the user has added status checks, do not check the definition of the task and only rely on the status check to run the test. This would mean the addition of status checks.

  • In other cases, check the task definition like GNU Make and cache the results according to sources/generates.

Doing this will solve your concerns regarding destructive tasks, but will also fix the current buggy behaviour where sometimes tasks do not run although the environment variables have changed.

I also want to mention that the current buggy behaviour is also dangerous. For example, I expect the task to run again when a variable changes, but currently task skips those in the subsequent calls, which can potentially result in incorrect/dangerous results:

- find_user_home a with variable user = A
- task uninstall_tool_for_user with tool = coffee   # removes ~/A/coffee
- find_user_home  with variable user = B        # currently skipped incorrectly so user stays as A!!
- task uninstall_tool_for_user with tool = tea  # it incorrectly removes tea for user A, while we wanted to remove tea for user B

This pull request fixes this case, which is the main pain point of using caching with Task

If you want this to be configurable, we can add a check_definition flag with these possible values:

  • auto (Default): check definition when status check is not specified for a task.
  • always: always check the definition
  • never: ignore the definition

aminya avatar Dec 17 '23 07:12 aminya

@andreynering @pd93

I have updated the approach to be configurable. Now, by default ("auto"), it will not check the definitions if status checks are defined for a task, but it considers the definition when sources are defined.

If desirable, check_definition can be set to "always" to always consider the definition of the task.

Rebased the branch and added tests to check and support the addition.

aminya avatar Feb 07 '24 04:02 aminya

Is there any reason why this was not merged? Fixing conflicts takes time every time. I appreciate some feedback @andreynering

aminya avatar Mar 23 '24 09:03 aminya

I have rebased this branch again. Could you take a look? @andreynering @pd93

aminya avatar Apr 25 '24 19:04 aminya

For those waiting on this essential feature:

I ended up forking task and publishing the binaries here: https://github.com/aminya/task/releases/tag/v3.37.0

In GitHub Actions, I have something like this to patch task via these binaries:

Details
      - name: Setup Cpp
        uses: aminya/setup-cpp@v1
        with:
          task: true

      - name: Patch Task
        run: |
          cd ~/task
          curl -LJO https://github.com/aminya/task/releases/download/v3.18.0/task_linux_amd64.tar.gz
          tar -xvf task_linux_amd64.tar.gz
          chmod +x ./task
          rm task_linux_amd64.tar.gz
          task --version

aminya avatar May 04 '24 06:05 aminya

Hi @aminya,

I'm sorry for the wait! I know many months have passed. The thing is just that this project demands more work than we actually have to dedicate to it, so we advance in the speed we can. Thanks for your patience!

Hopefully I'll be able to review this soon. I want to try this feature to have a clearer opinion on how this should behave.

andreynering avatar May 09 '24 01:05 andreynering