axom
axom copied to clipboard
BVH API Semantics when using asynchronous execution policies
The BVH may be instantiated using asynchronous CUDA execution policy as follows:
spin::BVH bvh< NDIMS, axom::CUDA_EXEC<BLOCK_SIZE,axom::ASYNC> >( aabbs, N );
Then constructing the BVH on the GPU can be accomplished as follows:
bvh.build();
When using asynchronous execution the call to bvh.build()
will return on the host, but, the GPU would still be constructing the BVH.
Moreover, calling find after bvh.build()
would be fine, since the kernels are currently launched on the same stream and will be executed in order.
Asynchronous execution is typically used to:
- Hide latencies due to kernel launch overhead and avoid synchronizing after each kernel
- Overlap execution on the host and on the GPU, for example:
// construct BVH on the GPU
spin::BVH bvh< NDIMS, axom::CUDA_EXEC<BLOCK_SIZE,axom::ASYNC> >( aabbs, N, pool_allocator );
bvh.build();
// while the BVH is being constructed on the GPU, pack buffers on the CPU
pack_buffers()
// call find
bvh.find( ... );
Additional speedups of the order of 1.5X to 2X have been observed when using asynchronous execution over synchronous execution with the present implementation.
However, other than specifying an asynchronous execution policy, it is is not clear from the API that subsequent calls are asynchronous.
Considerations
- Do we want all subsequent calls/queries to the BVH that launch kernels to synchronize internally at the end if the policy is asynchronous? This could limit potential overlap of execution on the CPU as indicated in the example above.
- Do we want to specify explicitly in the API that the method is running synchronously or asynchronously?
That could be done by different methods, e.g., :
bvh.ibuild(); // builds the BVH asynchronously on the GPU
// TODO: overlap execution on the GPU and GPU
do_stuff_on_cpu();
bvh.ifind(); // runs a find query on the GPU
// TODO: do more CPU stuff
do_more_stuff_on_the_cpu();
axom::synchronize(); // caller has to synchronize afterwards
It could also be done by a template argument
bvh.build< axom::SYNCH >( );
And there are probably a couple of other ways to do this. We need to come up with a clean and precise API design for this.
@gzagaris I think it's best and most flexible to support both synchronous and asynchronous operations when it make sense. We should choose a default behavior for each operation in a consistent manner and allow users to choose otherwise via the axom::ASYNC/axom::SYNC options. Note that we will be rolling out the asynchronous execution stuff in RAJA soon that I spoke about in the ASQ Webex. This would help users overlap operations should the choose to do that.
@rhornung67 -- thanks for the feedback. I totally agree. We already allow that and I am currently employing it. My concern is that in the API it is not clear that the operation is asynchronous, unless the user is also familiar with BVH internals.
Do you want me to add this to the agenda for today's Axom meeting?
Do you want me to add this to the agenda for today's Axom meeting?
Sure -- I've been mulling this over and if folks have ideas/suggestions it will be helpful.
@rhornung67 -- This is a design question.
I assigned it to you to help us make a determination about the next steps.
We should wait for a use case. @publixsubfan suggested trying in a test case.