Phil Miller - NOAA
Phil Miller - NOAA
PR is waiting on performance testing. Doesn't look like it's going to be ready for 4.0 and backported, so I'm removing the milestone.
This looks good for what it is. EMPIRE will try using it as soon as it makes its way downstream.
I know this is still a draft, but why shouldn't a fence call leave the instance in the completed state?
> > I know this is still a draft, but why shouldn't a fence call leave the instance in the completed state? > > It does. Since the implementation immediately...
From my comments on the issue - from the caller's perspective, is there a meaningful distinction between 'submitted' and 'running', or are they both effectively just 'incomplete'? CUDA and HIP...
> > Doesn't that entail an extra record/query pair that setting 'complete' in `fence` avoids? > > It's potentially another `query`, yes. My motivation, for now, was to make sure...
Fair enough on initial implementation vs optimization. In my EMPIRE use case, I don't think I'll ever have actual calls to `exec.fence()`, so it won't know the difference.
Ditto for HIP on both of the above, and maybe SYCL?
I was already skeptical that we could define this well in the absence of a broader discussion about what we mean by supporting multiple threads submitting work to a shared...
Having looked over the CUDA version with a lock, my instincts are that the lock should encompass the `invoke_kernel` and `cudaMemcpyAsync` calls as well. My first pass analysis is that...