AMGX
AMGX copied to clipboard
Thrust namespace fix
Thrust now provides a workaround for issues caused by multiple libraries / applications using thrust by wrapping the thrust namespace (https://github.com/NVIDIA/thrust/releases/tag/1.14.0)
This patch wraps all thrust calls in the amgx namespace.
Current major issue is how we offer support for CTK versions with pre-1.14.0 thrust.
This successfully fixes a challenging to debug issue that occurs with OpenFOAM + AmgX4FOAM.
I am not sure how we resolve the issue that we require a newer version of thrust than is available in any of the currently released CUDA toolkits. At a minimum for this fix we require 1.16.0 and so the way I have tested this is to manually fetch 1.17.0 of thrust and cub and pass in -DThrust_DIR and -DCUB_DIR.
Hi @marsaev - could you please take a look at this patch?
It turns out that if an application or library that uses AmgX also uses thrust, it is possible for calls to attempt to call across the modules, resulting in very hard to find bugs.
Supporting the change is challenging due to the thrust fix only being available in recent CTK. It is going to be much easier going forward if we just force a specific version of thrust, and the user will now have to call git clone --recursive
.
@mattmartineau
It turns out that if an application or library that uses AmgX also uses thrust, it is possible for calls to attempt to call across the modules, resulting in very hard to find bugs.
You mean when AMGX is used not as a shared library but rather through c++ interface and thus including sources?
I'm okay with adding dependency and fixing thrust version. If I understand correctly this insures that user pulled AMGX with thrust submodule, right? : https://github.com/NVIDIA/AMGX/pull/189/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR241-R242
You mean when AMGX is used not as a shared library but rather through c++ interface and thus including sources?
Actually, this is far more unfortunate. The issue is indeed when AmgX is used as a shared library, and it is included in another library that utilises thrust, see https://github.com/NVIDIA/thrust/releases/tag/1.13.1. This has hit a few applications and leads to really nasty non-determinism.
I'm okay with adding dependency and fixing thrust version.
That's great news - it certainly will make everything easier going forward because we can assure through testing.
If I understand correctly this insures that user pulled AMGX with thrust submodule, right?
That's what I am trying to achieve, exactly. I would be interested to hear if you have any ideas how to make it more robust?
Thanks!
@marsaev - happy for me to merge this one? There may be teething issues but we can work with the users on that before 2.4.0 release.