iree icon indicating copy to clipboard operation
iree copied to clipboard

[hip] Added hip_device_group_device to the runtime.

Open AWoloszyn opened this issue 4 months ago • 0 comments

This gives us an interface for creating a logical device from a set of physical hip devices. In a future PR I plan on removing the normal hip_device ut for now, until the device_group_device is completed and hardened, I am keeping the original around. There are also some optimizations to do for when we have a single device in our device group.

This implementation currently passes CTS (as well as the new CTS tests added for device groups), but there is some work to complete. 1) Fix memory pooling 2) Make sure that collectives work as expected. 3) Optimize our synchronization.

  • Currently synchronization across physical GPUs goes through the host, we should be able to avoid that, but it will take some additional work.

AWoloszyn avatar Oct 16 '24 14:10 AWoloszyn