Ben Vanik

Results 416 comments of Ben Vanik

aww, then probably have to keep it, do the check for a release, and then mark as deprecated.

You want the buffer import API, which let's you wrap a CUDA device pointer in an iree_hal_buffer_t: https://github.com/iree-org/iree/blob/05bbcf1385146d075829cd940a52bf06961614d0/runtime/src/iree/hal/allocator.h#L379

This is a quirk of the disassembler: vm.call only prints the base register of a 2xi32 pair. If you set a breakpoint in iree_hal_module_allocator_allocate the value should be correct.

Yep! The MVP only needs semaphore handling changes on the CPU side as it already uses the `iree_hal_buffer_t` vtable to perform the mapping instead of casting things directly. Import/export are...

> Importing the timepoint into `local_task`, I found the following steps might work: > > * `iree_hal_semaphore_acquire_timepoint` of the `value` > * `iree_wait_handle_wrap_primitive` of the `wait_primitive` to import. > *...

https://gist.github.com/benvanik/ecc9b37fb2b670ce1ed2fb0d7c694287 shows what I'm shooting for after #20965 on the compiler side with zero copies once we have the transfer elision pass. Initially it'll let us do homogeneous zero-copy sharding...

Dropping a note that https://github.com/iree-org/iree/blob/f26a830f71a35d3264333fa3f6a12f6cb5d35e30/compiler/src/iree/compiler/Dialect/Stream/Transforms/EmplaceAllocations.cpp#L91-L95 will need the topology check logic in order to perform in-place updates across devices.

Since we don't want to carry this any longer than we have to can you add `// TODO(#issue)` comments to the relevant code and make sure we delete it when...

(or ideally, move this into the deferred action queue instead?)

(thanks as always @hanhanW for continually improving/refining this code