Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

Updated Documentation on Buidling Trilinos with CUDA Support?

Open Shihab-Shahriar opened this issue 3 years ago • 11 comments

Documentation

I'm referring to this page.

It seems quite old, and I have tried changing every variations of the variables mentioned I can think of, with no success in building a CUDA aware Trilinos yet with MPI support. The doc mentioned some packages can not be compiled for CUDA yet due to nvcc bugs, has those been resolved? I'm specially interested in NOX package.

Can you please tell me about version compatibilities? I.e. what are the latest versions of CUDA, GCC, OpenMPI that has been tested successfully? I tried looking at CDash, but I'm assuming there are newer builds with newer versions?

For now, I'll be happy if you can please point me to a build script or anything that might help.

Thank you.

Shihab-Shahriar avatar Jul 25 '22 17:07 Shihab-Shahriar

@trilinos/framework We could definitely update that.

Currently, the newest CUDA build we're testing nightly is with CUDA10.1/ GCC 7.3.1 / OpenMPI 4.0.3. CUDA 11 should work, as should newer GCCs and almost any OpenMPI.

csiefer2 avatar Jul 28 '22 15:07 csiefer2

@Shihab-Shahriar Have a look in Trilinos/cmake/ctest/drivers/geminga and Trilinos/cmake/ctest/drivers/ascicgpu031 at the files that start with TrilinosCTestDriverCore. Those files contain parts of the cmake configure script for the nightly testing on the respective machines. In particular, they show the CUDA settings.

jhux2 avatar Jul 29 '22 01:07 jhux2

My problem was that the MPI libraries were not getting properly linked. Simply adding -lmpi flag to CMAKE_CXX_FLAGS solved it for me.

I'm leaving this open for now. While my particular issue has been resolved, I think the doc page does need quite a bit of updating.

Shihab-Shahriar avatar Aug 20 '22 15:08 Shihab-Shahriar

This is related to #11255.

ccober6 avatar Nov 10 '22 22:11 ccober6

We are currently looking at who is the right person to create the updated build documentation.

jwillenbring avatar Nov 11 '22 17:11 jwillenbring

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] avatar Nov 12 '23 12:11 github-actions[bot]

@jwillenbring Has this been addressed in the meantime?

mayrmt avatar Nov 13 '23 12:11 mayrmt

@csiefer2 Any thought who would be the right person to update this? Another option would be to remove this page or just have it be a pointer to other information concerning this topic. This may not be the right place to store this, for example if there is a script for building that is maintained and we can just reference where that is.

jwillenbring avatar Nov 13 '23 13:11 jwillenbring

@jwillenbring @csiefer2 Personally, I do not have a preference of how to store this information. I just find it important, that it is documented somewhere, so that one can rely on a curated guideline of installing Trilinos with CUDA support.

mayrmt avatar Nov 13 '23 14:11 mayrmt

I'll throw something in the mix here.

There are plenty of open source projects listed on Github document build instructions directly in their README.md because it's the first thing you see when you open the Github to create an issue or pull request. Currently, we keep our instructions in a separate INSTALL.rst. It's a degree of separation away, and despite being linked to by the README.md, I didn't know it existed until I dug around just now. I would love to see the INSTALL.rst get merged into its own section at the bottom of README.md that focuses on minimal build instructions. Additionally, referring to a separate website for build instructions is another degree of separation away, and it allows things to get even more out of sync with Trilinos' actual behavior, which is really undesirable.

As for Cuda documentation (or OpenMP while we're at it, since there's #12508), all we would need is an additional 4 lines in the build section of the README that says something like "Here are the common configure options for Cuda: -D Kokkos_ENABLE_CUDA:BOOL=ON and -D TPL_ENABLE_CUDA:BOOL=ON and -D Tpetra_ENABLE_CUDA:BOOL=ON and -D Tpetra_INST_CUDA:BOOL=ON" "Here's what you need for OpenMP: ..."

This would be extremely easy, and it would also make our lives easier by cutting down on "how do I configure Trilinos" issues.

GrahamBenHarper avatar Nov 13 '23 18:11 GrahamBenHarper