Yangyang Cai

Results 10 comments of Yangyang Cai

This is in another device I'm working on. It's not a nvidia platform. They use HIP as tools to deal with graphics cards. When compiling with C/C++ code with gpu...

Yes, I did used KOKKOS_ENABLE_HIP. Chatgpt told me inline function templates shall not be explicitly instantiated. Since inline function will be instantiated at every spot it is called. I removed...

The tests all passed on HIP device and failed on Cuda device. I carefully checked the expressions of f0 and find no errors, I simply changed the accuracy requirement from...

> @StaticObserver nice! i changed the merge branch to 1.1.0rc. we try to never directly merge to master. Sorry, I just found where to change the branch to merge

I tried my best to reduce the truncation error.

I see, but they worked well, i guess it was just not efficient?

> @StaticObserver all your pushes will be published here automatically. no need for additional PR. I see, thanks for reminding.

> [@StaticObserver](https://github.com/StaticObserver) can you please post the mpirun command line that you used? You mention in the title description that you see the issue without UCX. Open MPI 5.0.x without...

> [@StaticObserver](https://github.com/StaticObserver) thank you for the clarification. At this point here are the options that we have > > * UCX master has a new flag that allows you to...

> [@StaticObserver](https://github.com/StaticObserver) I can confirm that this is exactly the same as the issue i had (on CUDA) in [#12849](https://github.com/open-mpi/ompi/issues/12849) . Especially the fact that moving the buffers out of...