Draft: Add `@distribute` decorator for parallelizing functions
Motivation.
Although EvoTorch has had a functional API for a while, this functional API did not have easy-to-use parallelization capabilities for distributing computations across multiple devices. When writing an evolutionary search code using the functional API, the only way to evaluate solutions in parallel across multiple devices was to combine the functional-paradigm code with the object-oriented-paradigm code (by instantiating a Problem object with multiple actors and then transforming that Problem object to a functional evaluator using the method make_callable_evaluator). This approach was limiting (i.e. only parallelized solution evaluation in mind) and was cumbersome (forcing the programmer to mix the usages of two APIs).
The newly introduced feature.
This commit introduces a general-purpose decorator, evotorch.decorators.distribute, which can take a function and transform it into its parallelized counterpart. Like a Problem object, the distribute decorator can be configured in terms of number of actors (num_actors) and number of GPUs visible to each actor (num_gpus_per_actor). Alternatively, an explicit list of devices can be given (e.g. devices=["cuda:0", "cuda:1"]).
How does it work? Upon being called for the first time, the parallelized function will create ray actors. When it receives its arguments, the parallelized function follows these steps:
- split the arguments into chunks
- send each argument chunk into an actor, along with a request to apply the wrapped function on them
- wait for the parallel computation of the remote actors (each actor using its own associated device)
- collect, combine, and return the result
Example. A distributed function might look like this:
@distribute(devices=["cuda:0", "cuda:1"])
def f(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
# Note: number of positional arguments does not have to be 2
...
We assume that the function expects x and y to have the same leftmost dimension size, and that the function will return a resulting tensor with the same leftmost dimension size. Once called, this example distributed function will split x and y into two (along their leftmost dimensions), send the first halves to the first actor (the one using cuda:0) and the second halves to the second actor (the one using cuda:1), wait for the parallel computation, and collect and concatenate the resulting chunks into a single resulting tensor.
A distributed function can work with following input arguments and result types:
-
torch.Tensor -
evotorch.tools.ReadOnlyTensor -
evotorch.tools.TensorFrame -
evotorch.tools.ObjectArray - shallow (non-nested) dictionary-like object containing tensors and/or
ReadOnlyTensors and/orTensorFrames and/orObjectArrays - shallow (non-nested) sequence (e.g. list or tuple) containing tensors and/or
ReadOnlyTensors and/orTensorFrames and/orObjectArrays
Summary by CodeRabbit
-
New Features
- Add end-to-end distributed execution: automatic splitting of tensor-like inputs across actors/devices, chunked execution, and reassembly with optional target-device placement.
- Expand decorators: richer device-aware decorators (including CUDA/auxiliary), distribution decorator, and a vectorized decorator.
- Add utilities to move and analyze shallow tensor containers and a TensorFrame property indicating an enforced device.
-
Tests
- New tests validating distributed flows, device movement, chunking, and ObjectArray handling.
Walkthrough
Adds distributed chunking/stacking for tensors, TensorFrame, ObjectArray and containers; shallow-container device movers and device-counting helpers; and expanded decorators (vectorized, on_device, on_aux_device, on_cuda, distribute) enabling chunked, device-aware, actor-backed execution.
Changes
| Cohort / File(s) | Summary |
|---|---|
Distribution infrastructuresrc/evotorch/_distribute.py |
New module implementing chunk-based splitting (split_into_chunks, _split_*), argument splitting (split_arguments_into_chunks), reassembly (stack_chunks, _stack_*), and a decorator-style wrapper (DecoratorForDistributingFunctions) to orchestrate actor-backed parallel execution. Supports tensors, TensorFrame, ObjectArray, mappings, and sequences with sizing and device semantics. |
Device movement utilitiessrc/evotorch/tools/_shallow_containers.py |
New utilities to move shallow containers to devices: _move_tensorframe, _move, move_shallow_container_to_device, plus helpers _update_dict_additively, count_devices_within_shallow_container, and most_favored_device_among_arguments. Handles TensorFrame, ObjectArray, torch.Tensor, sequences, and mappings with move-only-from-CPU semantics. |
Expanded decorator systemsrc/evotorch/decorators.py |
Adds vectorized, on_device, on_aux_device, on_cuda, and distribute APIs; inline/functional transforms, device movement, optional chunking, and integration with the distribution backend (DecoratorForDistributingFunctions). Imports and typing adjusted accordingly. |
TensorFrame enhancementsrc/evotorch/tools/tensorframe.py |
Adds read-only property has_enforced_device indicating whether a TensorFrame was constructed with an enforced device. |
Teststests/test_decorators.py |
Tests extended to exercise the new distribute decorator and device/chunking behaviors, including ObjectArray and numpy-backed scenarios; adjusts parametrizations to focus on distribute-related cases. |
Sequence Diagram(s)
sequenceDiagram
autonumber
actor User
participant Wrapper as "@distribute / DecoratorForDistributingFunctions"
participant Splitter as "split_into_chunks"
participant Scheduler as "Dispatcher / Actors"
participant Worker as "Remote Actor (i)"
participant Gatherer as "stack_chunks"
User->>Wrapper: call distributed_fn(*args, **kwargs)
Wrapper->>Splitter: split_arguments_into_chunks(args, split_arguments, num_actors, chunk_size)
Splitter-->>Wrapper: chunks_per_actor (args_i, kwargs_i) for i=1..N
Wrapper->>Scheduler: dispatch chunks to actors (parallel)
Scheduler->>Worker: actor_i: run user function with (args_i, kwargs_i)
Worker-->>Scheduler: result_i
Scheduler-->>Wrapper: collect [result_1..result_N]
Wrapper->>Gatherer: stack_chunks([result_1..result_N])
Gatherer-->>User: reassembled_result
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
🐰
I split the lettuce, piece by piece,
Hopped to friends for quicker grease,
They compute, they hum, they bring it near—
I stitch the bites and give a cheer,
A rabbit grins: "Distributed feast!"
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 39.62% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title Check | ✅ Passed | The title accurately captures the primary addition of a @distribute decorator for parallelizing functions, concisely summarizing the main change in a clear sentence without unrelated details. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
feature/functional-distribute
📜 Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 2d8a6b6a8156bc456adfada3f8a78741a5fea0f3 and d1f48293a93f4dc21b4067784ab54dc62b814fb7.
📒 Files selected for processing (2)
-
src/evotorch/_distribute.py(1 hunks) -
tests/test_decorators.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/evotorch/_distribute.py (4)
src/evotorch/core.py (10)
Problem(365-3410)SolutionBatch(3590-4600)num_actors(2136-2142)cat(4581-4600)is_remote(2161-2165)evaluate(2532-2571)actors(2145-2151)is_main(2168-2173)actor_index(2154-2158)aux_device(1657-1692)src/evotorch/tools/objectarray.py (1)
ObjectArray(39-534)src/evotorch/tools/tensorframe.py (6)
TensorFrame(53-1175)device(482-498)to(457-467)is_read_only(759-763)get_read_only_view(765-769)vstack(930-959)src/evotorch/tools/_shallow_containers.py (1)
move_shallow_container_to_device(104-159)
tests/test_decorators.py (3)
src/evotorch/decorators.py (8)
distribute(1231-1485)on_aux_device(629-748)on_device(276-626)vectorized(212-273)decorator(135-137)decorator(515-624)decorator(738-743)decorator(1223-1226)src/evotorch/tools/objectarray.py (1)
ObjectArray(39-534)src/evotorch/tools/misc.py (3)
as_tensor(1817-1876)result(465-475)result(547-557)
🪛 Ruff (0.14.0)
src/evotorch/_distribute.py
73-73: Avoid specifying long messages outside the exception class
(TRY003)
76-76: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Avoid specifying long messages outside the exception class
(TRY003)
102-105: Avoid specifying long messages outside the exception class
(TRY003)
123-123: Avoid specifying long messages outside the exception class
(TRY003)
175-178: Avoid specifying long messages outside the exception class
(TRY003)
247-247: Avoid specifying long messages outside the exception class
(TRY003)
251-254: Avoid specifying long messages outside the exception class
(TRY003)
322-322: Avoid specifying long messages outside the exception class
(TRY003)
334-334: Avoid specifying long messages outside the exception class
(TRY003)
379-379: Avoid specifying long messages outside the exception class
(TRY003)
381-381: Avoid specifying long messages outside the exception class
(TRY003)
385-385: Avoid specifying long messages outside the exception class
(TRY003)
389-389: Avoid specifying long messages outside the exception class
(TRY003)
401-401: Avoid specifying long messages outside the exception class
(TRY003)
479-479: Avoid specifying long messages outside the exception class
(TRY003)
508-508: Avoid specifying long messages outside the exception class
(TRY003)
510-510: Avoid specifying long messages outside the exception class
(TRY003)
516-516: Avoid specifying long messages outside the exception class
(TRY003)
518-518: Avoid specifying long messages outside the exception class
(TRY003)
527-527: Avoid specifying long messages outside the exception class
(TRY003)
546-546: Avoid specifying long messages outside the exception class
(TRY003)
573-573: Avoid specifying long messages outside the exception class
(TRY003)
575-575: Avoid specifying long messages outside the exception class
(TRY003)
596-596: Avoid specifying long messages outside the exception class
(TRY003)
598-598: Avoid specifying long messages outside the exception class
(TRY003)
625-625: Avoid specifying long messages outside the exception class
(TRY003)
627-627: Avoid specifying long messages outside the exception class
(TRY003)
648-648: Avoid specifying long messages outside the exception class
(TRY003)
650-650: Avoid specifying long messages outside the exception class
(TRY003)
657-657: Avoid specifying long messages outside the exception class
(TRY003)
699-702: Avoid specifying long messages outside the exception class
(TRY003)
751-751: Avoid specifying long messages outside the exception class
(TRY003)
799-799: Avoid specifying long messages outside the exception class
(TRY003)
802-802: Avoid specifying long messages outside the exception class
(TRY003)
804-804: Avoid specifying long messages outside the exception class
(TRY003)
812-812: Avoid specifying long messages outside the exception class
(TRY003)
816-816: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
929-934: Avoid specifying long messages outside the exception class
(TRY003)
946-949: Avoid specifying long messages outside the exception class
(TRY003)
971-971: Avoid specifying long messages outside the exception class
(TRY003)
976-976: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
1014-1014: Avoid specifying long messages outside the exception class
(TRY003)
1016-1016: Avoid specifying long messages outside the exception class
(TRY003)
1025-1027: Avoid specifying long messages outside the exception class
(TRY003)
1109-1111: Avoid specifying long messages outside the exception class
(TRY003)
1110-1110: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1114-1114: Avoid specifying long messages outside the exception class
(TRY003)
1128-1128: Avoid specifying long messages outside the exception class
(TRY003)
1133-1137: Avoid specifying long messages outside the exception class
(TRY003)
1136-1136: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1141-1141: Avoid specifying long messages outside the exception class
(TRY003)
1145-1145: Avoid specifying long messages outside the exception class
(TRY003)
1149-1153: Avoid specifying long messages outside the exception class
(TRY003)
1152-1152: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1161-1165: Avoid specifying long messages outside the exception class
(TRY003)
1164-1164: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/test_decorators.py
198-198: Unused function argument: chunk_size
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (23)
tests/test_decorators.py (5)
15-20: LGTM!The new imports (
numpy,distribute,ObjectArray,as_tensor) are appropriate for the added test coverage.
138-150: LGTM!The test correctly verifies that
@on_device("meta")moves input tensors to the specified device and that the decorated function operates on the moved tensor.
152-183: LGTM!The test correctly validates both decorator and inline forms of
on_devicewith chunking, verifying that chunked results match non-chunked results.
185-229: Ignore the static analysis hint about unusedchunk_size.The
chunk_sizeparameter is used in the parametrized test cases where it's passed to thedistributedecorator configuration. This is a false positive from the static analysis tool.
231-258: LGTM!The test correctly validates
distributewithObjectArrayinputs, ensuring that distributed computation produces results matching non-distributed equivalents.src/evotorch/_distribute.py (18)
38-128: LGTM!The
_split_tensorfunction correctly handles splitting of tensors, TensorFrame, and ObjectArray into chunks with proper validation and device movement.
135-199: LGTM!The
_split_dictfunction correctly splits dictionary values by delegating to_split_tensorand maintaining consistent keys across chunks.
206-278: LGTM!The
_split_sequencefunction correctly handles both list and tuple sequences, with proper validation for named tuples and empty sequences.
280-337: LGTM!The public
split_into_chunksAPI correctly dispatches to the appropriate internal splitting function based on input type.
339-427: LGTM!The
split_arguments_into_chunksfunction correctly implements the logic for splitting marked arguments and duplicating unmarked ones.
482-549: LGTM!The
_stack_chunked_tensorsfunction correctly handles stacking for tensors (viatorch.cat), ObjectArray (viachain(*chunks)), and TensorFrame (viavstack). The fix for ObjectArray concatenation usingchain(*chunks)is now correct.
551-577: LGTM!The
_keys_of_all_dictshelper correctly validates key consistency across dictionaries.
579-604: LGTM!The
_stack_chunked_dictsfunction correctly combines dictionary chunks by stacking values for each key.
631-666: LGTM!The
_stack_chunked_sequencesfunction correctly handles both list and tuple sequences while preventing named tuples.
668-703: LGTM!The public
stack_chunksAPI correctly dispatches to the appropriate internal stacking function based on chunk type.
714-769: LGTM!The
_LockForTheMainProcessclass correctly implements a picklable lock that only works on the main process, with appropriate error messages when used after unpickling.
771-813: LGTM!The
_loosely_find_leftmost_dimension_sizehelper correctly extracts the leftmost dimension size from tensors, arrays, or their containers.
848-910: LGTM!The
_DistributedFunctionHandler.__init__and_evaluate_batchmethods correctly initialize the handler as a dummy Problem instance for leveraging Ray actor creation.
911-938: LGTM!The
_ensure_dummy_problem_is_parallelizedmethod correctly uses a lock to ensure actors are created exactly once, with appropriate error messaging.
939-953: LGTM!The
_iter_split_argumentsmethod correctly handles both explicit and implicit (all-True) split argument specifications.
992-1066: LGTM!The
call_wrapped_functionmethod correctly orchestrates the distributed execution: splitting arguments, dispatching to actors viaActorPool.map_unordered, and reassembling results in the correct order.
1068-1094: LGTM!The
_DistributedFunctionclass correctly wraps the handler and preserves decorator attributes like__evotorch_vectorized__and__evotorch_pass_info__. The fix for accessingself.wrap_info.function.__evotorch_pass_info__is now correct.
1167-1189: LGTM!The
_prepare_distributed_functioncorrectly implements function caching via the_Wrappedclass to avoid recreating handlers for the same function and configuration.
Comment @coderabbitai help to get the list of available commands and usage tips.
Codecov Report
:x: Patch coverage is 66.21160% with 198 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 75.03%. Comparing base (cebcac4) to head (d1f4829).
Additional details and impacted files
@@ Coverage Diff @@
## master #123 +/- ##
==========================================
- Coverage 75.43% 75.03% -0.41%
==========================================
Files 59 61 +2
Lines 9556 10111 +555
==========================================
+ Hits 7209 7587 +378
- Misses 2347 2524 +177
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.