lefthook icon indicating copy to clipboard operation
lefthook copied to clipboard

Using Parallel and Piped options at once

Open dlinch opened this issue 5 years ago • 6 comments

I ran into a situation where I wanted to run parallel and piped commands.

So essentially on a post-merge, I'm hoping to capture any dependency updates and any migrations.

post-merge:
  parallel: true
  piped: true
  commands:
    yarn:
      files: git diff --name-only HEAD master
      glob: '{package.json,yarn.lock}'
      run: yarn
      tags: frontend
    1_gem:
      files: git diff --name-only HEAD master
      glob: '{GEMFILE,GEMFILE.lock}'
      run: bundle exec bundle check || bundle install
      tags: backend
    2_migrate:
      files: git diff --name-only HEAD master
      glob: '{db/migrate/*}'
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare
      tags: backend

So, I want to run the FE and BE commands in parallel, because neither of them depend on the output of the other.

I can't run a migration unless my bundle check passes, so I want it piped after a bundle check || bundle install. I could run them all piped, but they don't need to be and I figure lefthook could be smart enough to handle both, basically any commands without the prepended numbering system are run in parallel and commands with the numbering system are piped.

dlinch avatar Aug 13 '19 21:08 dlinch

Hello @dlinch, thank you for issue!

any commands without the prepended numbering system are run in parallel and commands with the numbering system are piped.

I think it a good point. The main question how we can separate different piped groups. In my mind something like this:

post-merge:
  parallel: true
  commands:
    yarn:
      run: yarn
    1_gem:
      run: bundle exec bundle check || bundle install
      pipe_group: 1 // <-- how we separate piped groups
    2_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare
      pipe_group: 1 // <-- how we separate piped groups
    1_another_gem:
      run: bundle exec bundle check || bundle install
      pipe_group: 2 // <-- how we separate piped groups
    2_another_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare
      pipe_group: 2 // <-- how we separate piped groups

So we have 3 parallel groups of commands

  • yarn
  • 1_gem >> 2_migrate
  • 1_another_gem >> 2_another_migrate

Arkweid avatar Aug 14 '19 06:08 Arkweid

Thanks @Arkweid!

I would say that almost seems like two different features. One allows for parallel commands with a single piped group, and the other is to handle multiple pipe groups correct?

There's nothing I can find in the full guide that shows multiple piped groups as a feature.

dlinch avatar Aug 14 '19 15:08 dlinch

I would say that almost seems like two different features. One allows for parallel commands with a single piped group, and the other is to handle multiple pipe groups correct?

Yep.

There's nothing I can find in the full guide that shows multiple piped groups as a feature.

It not implemented yet. I just thinking about it.

Arkweid avatar Aug 14 '19 16:08 Arkweid

@Arkweid I suggest to generalize your proposal by considering three different behaviors: sequential, parallel, piped. sequential would be the default, and would run all tasks in series, if one fails the next ones would be run anyway. parallel would run tasks in parallel (of course). piped would behave like sequential, but would stop at the first failing task.

Then it should be possible to separately specify a global behavior and group behaviors. As or the global behavior, without any specification (default) it would be sequential, otherwise parallel or piped.

Group behavior would customize the global specification for the given group, using (following your proposal): sequential_group = <key>, parallel_group = <key>, piped_group = <key>.

So for instance:

post-merge:
  parallel: true
  commands:
    yarn:
      run: yarn
    1_gem:
      run: bundle exec bundle check || bundle install
      piped_group: something
    2_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare
      piped_group: something
    1_another_gem:
      run: bundle exec bundle check || bundle install
      sequential_group: something_else
    2_another_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare
      sequential_group: something_else

With this example, lefthook would first start in parallel the yarn, gem, and another_gem tasks. Task migrate would then be run only if gem is successful, while task another_migrate would always be run after another_gem.

This would also close issue #113.

lucatrv avatar Mar 15 '20 11:03 lucatrv

A better option would be to allow the definition of command groups, each one with its own behavior (sequential by default, or otherwise parallel or piped). Apart from fixing this issue and #113, this would also allow reusing defined groups among different git hooks.

The example above would become something like:

post-merge:
  parallel: true
  commands:
    yarn:
      run: yarn
    do_something:
      run_group: a_group
    do_something_else:
      run_group: another_group

a_group:
  piped: true
  commands:
    1_gem:
      run: bundle exec bundle check || bundle install
    2_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare

another_group:
  commands:
    1_another_gem:
      run: bundle exec bundle check || bundle install
    2_another_migrate:
      run: bundle exec rails db:migrate && bundle exec rails db:test:prepare

For this to work for both commands and scripts, allowing to control relative order and parallelization, the scripts variable should also be removed. Scripts would simply be listed under commands, the only difference would be the runner field instead of run.

lucatrv avatar Mar 15 '20 12:03 lucatrv

My team would also love this feature. It'd be very useful for the common case to "do a bunch of style fix-ups and checks to make sure generated files are up to date, some of which need to be done in sequence but most of which can be done in parallel".

As far as I can tell, what we're basically trying to create here is a dependency graph that lets each command know when it can safely run. CircleCI does this in a YAML file quite elegantly, I feel, with the following syntax:

workflows:
  version: 2
  build-and-test:
    jobs:
      - warm_backend_cache
      - warm_frontend_cache
      - run_cypress_tests:
          requires:
            - warm_backend_cache
            - warm_frontend_cache
      - run_other_tests:
          requires:
            - warm_backend_cache
            - warm_frontend_cache
      - lighthouseci:
          requires:
            - warm_backend_cache
            - warm_frontend_cache
      - percy/finalize_all:
          requires:
            - run_cypress_tests

Basically:

  • Each job has a unique name and appears in a list of jobs to run
  • By default, each job is assumed to be completely parallel-safe
  • Every job can add a requires key, which lists all other jobs that must complete successfully before this job can safely run

The things I like about this are:

  • It's very readable
  • It's very easy to explain and keep in one's head ("all jobs run by default in parallel: if some job depends on another, list it in the requires key)
  • It's very flexible
  • It has a reasonable default behavior if you omit the requires altogether

IMO, this approach is lighter weight and easier to grok than command groups or piped groups. I think that this approach could implement most of what's possible above with a little more readable syntax.

zeptonaut avatar Nov 24 '21 15:11 zeptonaut

Hey! Did you know there is a possibility to use lefthook inside lefthook? Just like in the following example

pre-commit:
  parallel: true
  commands:
    pipe1:
      run: LEFTHOOK_QUIET=meta,success lefthook run pipe1
    pipe2:
      run: LEFTHOOK_QUIET=meta,success lefthook run pipe2

pipe1:
  piped: true
  commands:
    1_run:
      run: echo pipe1_1
    2_run:
      run: echo pipe1_2

pipe2:
  piped: true
  commands:
    1_run:
      run: echo pipe2_1
    2_run:
      run: echo pipe2_2

Of course it may not look very cool (but I think I can add tweaks on output settings). But it is possible :)

I don't want to bring more complexity to lefthook configuration, because it is already quite complicated. And I think this approach is the most convenient, and I don't think it brings too much latency with loading lefthook executable many times.

As long as this issue is rather stale I am closing it. But feel free to open a new discussion, and maybe we will end up with a new feature or at least a problem solving :)

mrexox avatar Nov 14 '22 08:11 mrexox

I am the original opener of this issue! I think this is a perfectly fine solution, it seems obvious now that you've posted it, which is often true of the best solutions. As long as the output is readable I think this totally fair.

Thank you!

dlinch avatar Nov 17 '22 16:11 dlinch

Very helpful, thank you! 🙏

Maybe it is worth adding this to the doc?

toebgen avatar Feb 13 '23 18:02 toebgen