Gordon Brown comments

Results 61 comments of


                                            Gordon Brown

trafficstars

CP013: Return type for `traverse_topology` (P1795)

I agree, my thinking was that we should reintroduce the wording that we had for this in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0796r3.pdf as we spent quite a bit of time refining that in the...

CP004: Can Placeholder accessor be default constructible?

I can see the benefit of having this feature. I think the best approach to supporting this would be to have a default constructor for the accessor class when `access::placeholder`...

CP013: Support dynamic device discovery

Thinking further about how a potential interface for this could look I see two possible ways to do this. Firstly we could have a callback mechanism, where a user provides...

CP013: Missing a memory context if there are separate memory resources?

Yes you're right, I think we will since we are separating the execution and memory topologies there will now be no object which you can construct from a memory resource...

[nvidia|amd] Add missing synchronization

That's right, the approach described here is the only correct way to guarantee the operations enqueued within the host task are synchronized with, following the current SYCL 2020 specification, however,...

[nvidia|amd] Add missing synchronization

@densamoilov apologies for the late reply, I hadn't seen your response. That's right this solution would only work in the case of a single in-order queue, though as it relies...

unsupported getri_batch/getrf_batch for Nvidia

Hi @Soujanyajanga, thanks for raising this issue, I can give you an update on the progress of these operations for the Nvidia backend. For `getrf_batch` Nvidia supports an equivalent to...

unsupported getri_batch/getrf_batch for Nvidia

@Soujanyajanga yes, I think this is the approach we would take, if you can share your workaround this could be useful, thanks, I've added this to our roadmap so someone...

[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts

Now that https://github.com/intel/llvm/pull/5095 is merged this should address the problem for the CUDA backend, so I will remove the CUDA label. @bader I believe the remaining issue here is with...

[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts

I'm not sure about Level Zero, but AFAICT OpenCL doesn't have any limitation to the global work size, the only thing I see is there's the `CL_KERNEL_GLOBAL_WORK_SIZE` query for `clGetKernelWorkGroupInfo`,...