Diptorup Deb
Diptorup Deb
The slowdown maybe related to kernel launch overhead in the `JitKernel` custom dispatcher class. Overhead is especially noticeable with small problem sizes. The `experimental.dispatcher.KernelDispatcher` fixes the launch overhead. Can you...
@fcharras sorry for getting to this issue so late. Will you want to open a PR contributing your RNG implementation to numba-dpex? We can review and merge it.
> I will be busy early this week, I'll start working on it mid-week if that's fine for you. I am out this week. If you want to start next...
There are two issues here: 1. Parfor does not support `sum` with the `axis` keyword. 2. GPU kernel generation for reductions is not yet supported. @DrTodd13 can you take a...
@fcharras The issue here is that the `math.ceil` and `math.floor` functions are replaced by the SYCL equivalents that only support floating point values. We are looking at a solution where...
Updated the reproducer to latest API and I can reproduce the freeze/deadlock reported previously: ```python import argparse import math import dpctl import dpnp import numpy as np import numpy.random as...
> Updated the reproducer to latest API and I can reproduce the freeze/deadlock reported previously: > I experience the issue on a Gen9 integrated graphics only at problem size `2**18`...
@mingjie-intel have a look. These suggested use cases can serve as good motivating examples for your reduction kernel work.
@roxx30198 The examples suggested by @oleksandr-pavlyk are a good starting point for you to get familiarized with numba-dpex and parallel-programming in general.
@DrTodd13 fixed! The formatting that is :wink:, I will take a look and provide an update.