Trilinos
Trilinos copied to clipboard
MueLu: Unit test failures, cuda/10.1.105, UVM disabled / Re-enable no-UVM nightlies
Bug Report
@trilinos/muelu
Description
In builds of Trilinos with cuda/10.1.05 and Kokkos_ENABLE_CUDA_UVM=OFF the following MueLu unit tests failed:
26 - MueLu_GeneralBlockSmoothing_MPI_4 (Failed)
32 - MueLu_DriverDiagonalModifications_MPI_1 (Failed)
39 - MueLu_CalcRotations_MPI_1 (Failed)
40 - MueLu_CalcRotations_MPI_4 (Failed)
In the MueLu_GeneralBlockSmoothing_MPI_4
test, this type of error output was emitted:
...
Clearing old data (if any)
Using default factory (SmootherFactory[47] ) for building 'Smoother'.
Level 0
Setup Smoother (MueLu::Ifpack2Smoother{type = SCHWARZ})
p=1: *** Caught standard std::exception of type 'std::runtime_error' :
Tpetra::Details::WrappedDualView (name = MV::DualView; host use_count = 3; device use_count = 2): Cannot access data on device while a host view is alive
p=2: *** Caught standard std::exception of type 'std::runtime_error' :
Tpetra::Details::WrappedDualView (name = MV::DualView; host use_count = 3; device use_count = 2): Cannot access data on device while a host view is alive
p=0: *** Caught standard std::exception of type 'std::runtime_error' :
Tpetra::Details::WrappedDualView (name = MV::DualView; host use_count = 3; device use_count = 2): Cannot access data on device while a host view is alive
p=3: *** Caught standard std::exception of type 'std::runtime_error' :
...
Steps to Reproduce
- SHA1: e061ffc182ade4040e0743cd9ff988d9f87ded5b
- Configure script: Weaver testbed rhel7W queue
export ATDM_CONFIG_REGISTER_CUSTOM_CONFIG_DIR=${TRILINOS_DIR}/cmake/std/atdm/contributed/weaver
source ${TRILINOS_DIR}/cmake/std/atdm/load-env.sh weaver-cuda-10.1-opt
export ATDM_CONFIG_USE_NINJA=OFF
unset CUDA_LAUNCH_BLOCKING
# Configure
cmake \
-G"Unix Makefiles" \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DCMAKE_INSTALL_PREFIX="${PWD}/install" \
-DCMAKE_CXX_STANDARD="14" \
-DCMAKE_CXX_FLAGS="-pedantic -Wall" \
-DTrilinos_ENABLE_TESTS=OFF \
-DTrilinos_ENABLE_ALL_PACKAGES=OFF \
-DKokkos_ENABLE_CUDA_UVM=OFF \
-DTrilinos_ENABLE_MueLu=ON \
-DMueLu_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Stokhos=ON \
$TRILINOS_DIR
@ndellingwood At the moment, the PR tester isn't testing basically anything in the Tpetra stack with UVM disabled. So stuff like this can sneak through unimpeded.
Once we can get the current AT issues sorted out we might be able to fix this.
Adding Tpetra label so we can remember start enabling Tpetra stuff on the non-UVM tests
First try at the autotester submitted...
@jhux2 I'm not sure if you guys are still tracking MueLu_DriverDiagonalModifications_MPI_1 down, but it can be fixed by commenting out MueLu_FilteredAFactory_def.hpp:240. The const cast of the host view (getLocalRowView on line 294) doesn't trigger the host modify flag in the dual view.
@seanofthemillers Thank you, that does in fact fix it! I really appreciate the help.
Can this be closed?
Can this be closed?
@ndellingwood ?
Yeah, the updated PR testing isn't flagging tests and the cuda/10.1.105 version is no longer relevant