Yangyang Cai
Yangyang Cai
This is in another device I'm working on. It's not a nvidia platform. They use HIP as tools to deal with graphics cards. When compiling with C/C++ code with gpu...
Yes, I did used KOKKOS_ENABLE_HIP. Chatgpt told me inline function templates shall not be explicitly instantiated. Since inline function will be instantiated at every spot it is called. I removed...
The tests all passed on HIP device and failed on Cuda device. I carefully checked the expressions of f0 and find no errors, I simply changed the accuracy requirement from...
> @StaticObserver nice! i changed the merge branch to 1.1.0rc. we try to never directly merge to master. Sorry, I just found where to change the branch to merge
I tried my best to reduce the truncation error.
I see, but they worked well, i guess it was just not efficient?
> @StaticObserver all your pushes will be published here automatically. no need for additional PR. I see, thanks for reminding.
> [@StaticObserver](https://github.com/StaticObserver) can you please post the mpirun command line that you used? You mention in the title description that you see the issue without UCX. Open MPI 5.0.x without...
> [@StaticObserver](https://github.com/StaticObserver) thank you for the clarification. At this point here are the options that we have > > * UCX master has a new flag that allows you to...
> [@StaticObserver](https://github.com/StaticObserver) I can confirm that this is exactly the same as the issue i had (on CUDA) in [#12849](https://github.com/open-mpi/ompi/issues/12849) . Especially the fact that moving the buffers out of...