exo icon indicating copy to clipboard operation
exo copied to clipboard

feat: implement bandwidth-aware shard assignment for pipeline parallelism

Open abendrothj opened this issue 2 months ago • 0 comments

This PR implements bandwidth-aware shard assignment for pipeline parallelism, minimizing total inference time by assigning more layers to devices with higher memory bandwidth. This aligns with Issue #957.

Changes

  • Profiling: Added memory_bandwidth to NodePerformanceProfile and Apple Silicon bandwidth data.
  • Assignment: Implemented greedy assignment algorithm in placement_utils.py.
  • Verification: Added test_get_shard_assignments_bandwidth_aware to confirm optimal distribution.

abendrothj avatar Jan 03 '26 13:01 abendrothj