Kenichi Maehashi

Results 298 comments of Kenichi Maehashi

It's tracked in https://github.com/cupy/cupy/issues/5649 but I guess PyTorch has the same issue. For the meantime, if you are on Linux I'd suggest trying `fork` mode instead of `spawn`. Note that...

How about using float32 instead of float64? Double precision is much slower than singles on GPUs.

I'm also unable to reproduce this issue with the docker image (https://github.com/cupy/cupy/issues/5846#issuecomment-937021399) with NVIDIA GeForce GTX 1060 6GB (CC = 61). I once observed `radix_sort: failed on 2nd step` error...

> 1. The installer (`cupyx.tools.install_library`) can take advantage of the cuTENSOR wheel > > * ideally we wanna do `pip install cutensor` so that `pip` is made aware of it...

I think this sounds good. The only concern I come up with is the mismatch of supported CUDA versions between CUB and CuPy.

It seems `cyl_bessel_i1` in HIP returns `nan` to some `float16` inputs. ``` 12:09:44 =================================== FAILURES =================================== 12:09:44 _____________________________ TestSpecial.test_i1 ______________________________ 12:09:44 12:09:44 self = 12:09:44 12:09:44 def test_i1(self): 12:09:44 >...

Sorry, this is a duplicate of #322.

I'm maintaining CuPy (https://github.com/cupy/cupy/). I'd like to join to discuss and learn how other projects doing to grow community/contributor and handle issues/pull-requests.