bazel icon indicating copy to clipboard operation
bazel copied to clipboard

rdeps clauses in query are not executed in parallel

Open guw opened this issue 1 year ago • 4 comments

Description of the feature request:

I would like to be able to perform multiple bazel query operations in parallel without using a different output_base.

Which category does this issue belong to?

Performance

What underlying problem are you trying to solve with this feature?

I need to query for rdeps but I need to exclude references to the package itself. Therefore I am using the following query:

rdeps( //..., //foo, 1) except //foo/...

I do have N number of packages. Thus, I need to run this query N times. It would be great if I can do this in parallel.

I also did try with a combined query:

(rdeps( //..., //foo, 1) except //foo/...) 
+
(rdeps( //..., //bar, 1) except //bar/...)
+
..

However, that does not seem to trigger any performance optimization within Bazel, i.e. all (rdeps ...) clauses seems to be executed sequentially one by another.

Which operating system are you running Bazel on?

macOS 14.4

What is the output of bazel info release?

7.0.1

guw avatar Mar 19 '24 09:03 guw

This is basically a subset of https://github.com/bazelbuild/bazel/issues/532.

It certainly would be easier to implement, but it is still very much non-trivial. These are some old notes on the general problem.

  • BlazeModule and BlazeRuntime are not thread-safe.
  • The lock in BlazeCommandDispatcher needs to be moved to BlazeWorkspace (if thread-safe).
  • State needs to be encapsulated in Skyframe execution.
  • Supporting concurrent query would be easier since the state is smaller.
  • We have to be careful how we compare states.
  • Profiler is not thread-safe, we would need a map of threads to profilers.
  • Same for logger.
  • If we want to do a simpler, first step and want to support help concurrent to other commands, we need to:
    • Disable profiling for help.
    • Make BlazeRuntime thread-safe.
    • Remove workspace state from BlazeRuntime.
  • Of course it doesn't work with batch mode - which should be removed anyway.

@lberki or @ulfjack might have some ideas / comments on feasibility.

meisterT avatar Mar 19 '24 10:03 meisterT

As per my https://github.com/bazelbuild/bazel/issues/532#issuecomment-2006697882 , I'll close this bug as a duplicate of #532. Running multiple build commands in parallel would entail more work than doing so with query, but the latter is already an enormous task so I don't think it makes sense to separate the two.

As much as I'd like to do this, I don't think we are going to get to this in the foreseeable future :(

lberki avatar Mar 19 '24 10:03 lberki

That said, the rdeps() clauses could be executed in parallel, so on second thought, I'll reopen this issue and reframe it as "please make bazel query faster".

lberki avatar Mar 19 '24 10:03 lberki

Is this (or #532) the right issue to encapsulate the following request?

Buck has a feature in its query command (described under "Executing multiple queries at once") wherein a query can be performed on multiple targets in parallel, and the outputs are bucketed for each target. For example, the query buck query "testsof(deps( %s ))" target1 target2 target3 with the flag --output-format json will return a JSON object keyed by each target with values corresponding to the result of the query for each target separately.

The only alternative to this approach I'm aware of that's currently supported in Bazel is to collect the union of the targets' query results as protobuf, parse that, and perform the processing separately. While more efficient than querying for each target separately, it has been an adjustment to get to grips with in our codebase. Supporting this narrow kind of parallel processing in Bazel itself would be a big help in improving performance of our query-based tooling.

aaronsky avatar Jun 29 '24 14:06 aaronsky

Probably neither? If we wanted to implement the above Buck feature, it would be much easier to do as a feature of bazel query instead of making it possible to run multiple bazel query commands at the same time.

lberki avatar Jul 01 '24 06:07 lberki