evotorch icon indicating copy to clipboard operation
evotorch copied to clipboard

Draft: Add `@distribute` decorator for parallelizing functions

Open engintoklu opened this issue 7 months ago • 2 comments

Motivation. Although EvoTorch has had a functional API for a while, this functional API did not have easy-to-use parallelization capabilities for distributing computations across multiple devices. When writing an evolutionary search code using the functional API, the only way to evaluate solutions in parallel across multiple devices was to combine the functional-paradigm code with the object-oriented-paradigm code (by instantiating a Problem object with multiple actors and then transforming that Problem object to a functional evaluator using the method make_callable_evaluator). This approach was limiting (i.e. only parallelized solution evaluation in mind) and was cumbersome (forcing the programmer to mix the usages of two APIs).

The newly introduced feature. This commit introduces a general-purpose decorator, evotorch.decorators.distribute, which can take a function and transform it into its parallelized counterpart. Like a Problem object, the distribute decorator can be configured in terms of number of actors (num_actors) and number of GPUs visible to each actor (num_gpus_per_actor). Alternatively, an explicit list of devices can be given (e.g. devices=["cuda:0", "cuda:1"]).

How does it work? Upon being called for the first time, the parallelized function will create ray actors. When it receives its arguments, the parallelized function follows these steps:

  • split the arguments into chunks
  • send each argument chunk into an actor, along with a request to apply the wrapped function on them
  • wait for the parallel computation of the remote actors (each actor using its own associated device)
  • collect, combine, and return the result

Example. A distributed function might look like this:

@distribute(devices=["cuda:0", "cuda:1"])
def f(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    # Note: number of positional arguments does not have to be 2
    ...

We assume that the function expects x and y to have the same leftmost dimension size, and that the function will return a resulting tensor with the same leftmost dimension size. Once called, this example distributed function will split x and y into two (along their leftmost dimensions), send the first halves to the first actor (the one using cuda:0) and the second halves to the second actor (the one using cuda:1), wait for the parallel computation, and collect and concatenate the resulting chunks into a single resulting tensor.

A distributed function can work with following input arguments and result types:

  • torch.Tensor
  • evotorch.tools.ReadOnlyTensor
  • evotorch.tools.TensorFrame
  • evotorch.tools.ObjectArray
  • shallow (non-nested) dictionary-like object containing tensors and/or ReadOnlyTensors and/or TensorFrames and/or ObjectArrays
  • shallow (non-nested) sequence (e.g. list or tuple) containing tensors and/or ReadOnlyTensors and/or TensorFrames and/or ObjectArrays

Summary by CodeRabbit

  • New Features

    • Add end-to-end distributed execution: automatic splitting of tensor-like inputs across actors/devices, chunked execution, and reassembly with optional target-device placement.
    • Expand decorators: richer device-aware decorators (including CUDA/auxiliary), distribution decorator, and a vectorized decorator.
    • Add utilities to move and analyze shallow tensor containers and a TensorFrame property indicating an enforced device.
  • Tests

    • New tests validating distributed flows, device movement, chunking, and ObjectArray handling.

engintoklu avatar Sep 12 '25 16:09 engintoklu

Walkthrough

Adds distributed chunking/stacking for tensors, TensorFrame, ObjectArray and containers; shallow-container device movers and device-counting helpers; and expanded decorators (vectorized, on_device, on_aux_device, on_cuda, distribute) enabling chunked, device-aware, actor-backed execution.

Changes

Cohort / File(s) Summary
Distribution infrastructure
src/evotorch/_distribute.py
New module implementing chunk-based splitting (split_into_chunks, _split_*), argument splitting (split_arguments_into_chunks), reassembly (stack_chunks, _stack_*), and a decorator-style wrapper (DecoratorForDistributingFunctions) to orchestrate actor-backed parallel execution. Supports tensors, TensorFrame, ObjectArray, mappings, and sequences with sizing and device semantics.
Device movement utilities
src/evotorch/tools/_shallow_containers.py
New utilities to move shallow containers to devices: _move_tensorframe, _move, move_shallow_container_to_device, plus helpers _update_dict_additively, count_devices_within_shallow_container, and most_favored_device_among_arguments. Handles TensorFrame, ObjectArray, torch.Tensor, sequences, and mappings with move-only-from-CPU semantics.
Expanded decorator system
src/evotorch/decorators.py
Adds vectorized, on_device, on_aux_device, on_cuda, and distribute APIs; inline/functional transforms, device movement, optional chunking, and integration with the distribution backend (DecoratorForDistributingFunctions). Imports and typing adjusted accordingly.
TensorFrame enhancement
src/evotorch/tools/tensorframe.py
Adds read-only property has_enforced_device indicating whether a TensorFrame was constructed with an enforced device.
Tests
tests/test_decorators.py
Tests extended to exercise the new distribute decorator and device/chunking behaviors, including ObjectArray and numpy-backed scenarios; adjusts parametrizations to focus on distribute-related cases.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Wrapper as "@distribute / DecoratorForDistributingFunctions"
    participant Splitter as "split_into_chunks"
    participant Scheduler as "Dispatcher / Actors"
    participant Worker as "Remote Actor (i)"
    participant Gatherer as "stack_chunks"

    User->>Wrapper: call distributed_fn(*args, **kwargs)
    Wrapper->>Splitter: split_arguments_into_chunks(args, split_arguments, num_actors, chunk_size)
    Splitter-->>Wrapper: chunks_per_actor (args_i, kwargs_i) for i=1..N
    Wrapper->>Scheduler: dispatch chunks to actors (parallel)
    Scheduler->>Worker: actor_i: run user function with (args_i, kwargs_i)
    Worker-->>Scheduler: result_i
    Scheduler-->>Wrapper: collect [result_1..result_N]
    Wrapper->>Gatherer: stack_chunks([result_1..result_N])
    Gatherer-->>User: reassembled_result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I split the lettuce, piece by piece,
Hopped to friends for quicker grease,
They compute, they hum, they bring it near—
I stitch the bites and give a cheer,
A rabbit grins: "Distributed feast!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.62% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately captures the primary addition of a @distribute decorator for parallelizing functions, concisely summarizing the main change in a clear sentence without unrelated details.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch feature/functional-distribute

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d8a6b6a8156bc456adfada3f8a78741a5fea0f3 and d1f48293a93f4dc21b4067784ab54dc62b814fb7.

📒 Files selected for processing (2)
  • src/evotorch/_distribute.py (1 hunks)
  • tests/test_decorators.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/evotorch/_distribute.py (4)
src/evotorch/core.py (10)
  • Problem (365-3410)
  • SolutionBatch (3590-4600)
  • num_actors (2136-2142)
  • cat (4581-4600)
  • is_remote (2161-2165)
  • evaluate (2532-2571)
  • actors (2145-2151)
  • is_main (2168-2173)
  • actor_index (2154-2158)
  • aux_device (1657-1692)
src/evotorch/tools/objectarray.py (1)
  • ObjectArray (39-534)
src/evotorch/tools/tensorframe.py (6)
  • TensorFrame (53-1175)
  • device (482-498)
  • to (457-467)
  • is_read_only (759-763)
  • get_read_only_view (765-769)
  • vstack (930-959)
src/evotorch/tools/_shallow_containers.py (1)
  • move_shallow_container_to_device (104-159)
tests/test_decorators.py (3)
src/evotorch/decorators.py (8)
  • distribute (1231-1485)
  • on_aux_device (629-748)
  • on_device (276-626)
  • vectorized (212-273)
  • decorator (135-137)
  • decorator (515-624)
  • decorator (738-743)
  • decorator (1223-1226)
src/evotorch/tools/objectarray.py (1)
  • ObjectArray (39-534)
src/evotorch/tools/misc.py (3)
  • as_tensor (1817-1876)
  • result (465-475)
  • result (547-557)
🪛 Ruff (0.14.0)
src/evotorch/_distribute.py

73-73: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Avoid specifying long messages outside the exception class

(TRY003)


86-86: Avoid specifying long messages outside the exception class

(TRY003)


91-91: Avoid specifying long messages outside the exception class

(TRY003)


102-105: Avoid specifying long messages outside the exception class

(TRY003)


123-123: Avoid specifying long messages outside the exception class

(TRY003)


175-178: Avoid specifying long messages outside the exception class

(TRY003)


247-247: Avoid specifying long messages outside the exception class

(TRY003)


251-254: Avoid specifying long messages outside the exception class

(TRY003)


322-322: Avoid specifying long messages outside the exception class

(TRY003)


334-334: Avoid specifying long messages outside the exception class

(TRY003)


379-379: Avoid specifying long messages outside the exception class

(TRY003)


381-381: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Avoid specifying long messages outside the exception class

(TRY003)


389-389: Avoid specifying long messages outside the exception class

(TRY003)


401-401: Avoid specifying long messages outside the exception class

(TRY003)


479-479: Avoid specifying long messages outside the exception class

(TRY003)


508-508: Avoid specifying long messages outside the exception class

(TRY003)


510-510: Avoid specifying long messages outside the exception class

(TRY003)


516-516: Avoid specifying long messages outside the exception class

(TRY003)


518-518: Avoid specifying long messages outside the exception class

(TRY003)


527-527: Avoid specifying long messages outside the exception class

(TRY003)


546-546: Avoid specifying long messages outside the exception class

(TRY003)


573-573: Avoid specifying long messages outside the exception class

(TRY003)


575-575: Avoid specifying long messages outside the exception class

(TRY003)


596-596: Avoid specifying long messages outside the exception class

(TRY003)


598-598: Avoid specifying long messages outside the exception class

(TRY003)


625-625: Avoid specifying long messages outside the exception class

(TRY003)


627-627: Avoid specifying long messages outside the exception class

(TRY003)


648-648: Avoid specifying long messages outside the exception class

(TRY003)


650-650: Avoid specifying long messages outside the exception class

(TRY003)


657-657: Avoid specifying long messages outside the exception class

(TRY003)


699-702: Avoid specifying long messages outside the exception class

(TRY003)


751-751: Avoid specifying long messages outside the exception class

(TRY003)


799-799: Avoid specifying long messages outside the exception class

(TRY003)


802-802: Avoid specifying long messages outside the exception class

(TRY003)


804-804: Avoid specifying long messages outside the exception class

(TRY003)


812-812: Avoid specifying long messages outside the exception class

(TRY003)


816-816: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


929-934: Avoid specifying long messages outside the exception class

(TRY003)


946-949: Avoid specifying long messages outside the exception class

(TRY003)


971-971: Avoid specifying long messages outside the exception class

(TRY003)


976-976: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


1014-1014: Avoid specifying long messages outside the exception class

(TRY003)


1016-1016: Avoid specifying long messages outside the exception class

(TRY003)


1025-1027: Avoid specifying long messages outside the exception class

(TRY003)


1109-1111: Avoid specifying long messages outside the exception class

(TRY003)


1110-1110: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1114-1114: Avoid specifying long messages outside the exception class

(TRY003)


1128-1128: Avoid specifying long messages outside the exception class

(TRY003)


1133-1137: Avoid specifying long messages outside the exception class

(TRY003)


1136-1136: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1141-1141: Avoid specifying long messages outside the exception class

(TRY003)


1145-1145: Avoid specifying long messages outside the exception class

(TRY003)


1149-1153: Avoid specifying long messages outside the exception class

(TRY003)


1152-1152: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1161-1165: Avoid specifying long messages outside the exception class

(TRY003)


1164-1164: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tests/test_decorators.py

198-198: Unused function argument: chunk_size

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (23)
tests/test_decorators.py (5)

15-20: LGTM!

The new imports (numpy, distribute, ObjectArray, as_tensor) are appropriate for the added test coverage.


138-150: LGTM!

The test correctly verifies that @on_device("meta") moves input tensors to the specified device and that the decorated function operates on the moved tensor.


152-183: LGTM!

The test correctly validates both decorator and inline forms of on_device with chunking, verifying that chunked results match non-chunked results.


185-229: Ignore the static analysis hint about unused chunk_size.

The chunk_size parameter is used in the parametrized test cases where it's passed to the distribute decorator configuration. This is a false positive from the static analysis tool.


231-258: LGTM!

The test correctly validates distribute with ObjectArray inputs, ensuring that distributed computation produces results matching non-distributed equivalents.

src/evotorch/_distribute.py (18)

38-128: LGTM!

The _split_tensor function correctly handles splitting of tensors, TensorFrame, and ObjectArray into chunks with proper validation and device movement.


135-199: LGTM!

The _split_dict function correctly splits dictionary values by delegating to _split_tensor and maintaining consistent keys across chunks.


206-278: LGTM!

The _split_sequence function correctly handles both list and tuple sequences, with proper validation for named tuples and empty sequences.


280-337: LGTM!

The public split_into_chunks API correctly dispatches to the appropriate internal splitting function based on input type.


339-427: LGTM!

The split_arguments_into_chunks function correctly implements the logic for splitting marked arguments and duplicating unmarked ones.


482-549: LGTM!

The _stack_chunked_tensors function correctly handles stacking for tensors (via torch.cat), ObjectArray (via chain(*chunks)), and TensorFrame (via vstack). The fix for ObjectArray concatenation using chain(*chunks) is now correct.


551-577: LGTM!

The _keys_of_all_dicts helper correctly validates key consistency across dictionaries.


579-604: LGTM!

The _stack_chunked_dicts function correctly combines dictionary chunks by stacking values for each key.


631-666: LGTM!

The _stack_chunked_sequences function correctly handles both list and tuple sequences while preventing named tuples.


668-703: LGTM!

The public stack_chunks API correctly dispatches to the appropriate internal stacking function based on chunk type.


714-769: LGTM!

The _LockForTheMainProcess class correctly implements a picklable lock that only works on the main process, with appropriate error messages when used after unpickling.


771-813: LGTM!

The _loosely_find_leftmost_dimension_size helper correctly extracts the leftmost dimension size from tensors, arrays, or their containers.


848-910: LGTM!

The _DistributedFunctionHandler.__init__ and _evaluate_batch methods correctly initialize the handler as a dummy Problem instance for leveraging Ray actor creation.


911-938: LGTM!

The _ensure_dummy_problem_is_parallelized method correctly uses a lock to ensure actors are created exactly once, with appropriate error messaging.


939-953: LGTM!

The _iter_split_arguments method correctly handles both explicit and implicit (all-True) split argument specifications.


992-1066: LGTM!

The call_wrapped_function method correctly orchestrates the distributed execution: splitting arguments, dispatching to actors via ActorPool.map_unordered, and reassembling results in the correct order.


1068-1094: LGTM!

The _DistributedFunction class correctly wraps the handler and preserves decorator attributes like __evotorch_vectorized__ and __evotorch_pass_info__. The fix for accessing self.wrap_info.function.__evotorch_pass_info__ is now correct.


1167-1189: LGTM!

The _prepare_distributed_function correctly implements function caching via the _Wrapped class to avoid recreating handlers for the same function and configuration.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 16 '25 13:10 coderabbitai[bot]

Codecov Report

:x: Patch coverage is 66.21160% with 198 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 75.03%. Comparing base (cebcac4) to head (d1f4829).

Files with missing lines Patch % Lines
src/evotorch/_distribute.py 62.92% 142 Missing :warning:
src/evotorch/tools/_shallow_containers.py 50.00% 43 Missing :warning:
src/evotorch/decorators.py 89.47% 12 Missing :warning:
src/evotorch/tools/tensorframe.py 66.66% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #123      +/-   ##
==========================================
- Coverage   75.43%   75.03%   -0.41%     
==========================================
  Files          59       61       +2     
  Lines        9556    10111     +555     
==========================================
+ Hits         7209     7587     +378     
- Misses       2347     2524     +177     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 16 '25 13:10 codecov[bot]