marian-dev
marian-dev copied to clipboard
inits::normal() is broken for odd number of parameters
Bug description
I'm trying to generate a node with normal distribution but it fails on both GPU and CPU.
[2021-06-03 13:53:11] Error: Curand error 105 - ./marian-pruned/src/tensors/rand.cpp:106: curandGenerateNormal(generator_, tensor->data(), tensor->size(), mean, stddev)
[2021-06-03 13:53:11] Error: Aborted from virtual void marian::CurandRandomGenerator::normal(marian::Tensor, float, float) in ./marian-pruned/src/tensors/rand.cpp:106
[CALL STACK]
[0xd9befe] marian::CurandRandomGenerator:: normal (IntrusivePtr<marian::TensorBase>, float, float) + 0x5de
[0xa12bae]
[0xa1d519] marian::inits::LambdaInitConvert:: apply (IntrusivePtr<marian::TensorBase>) + 0x7a9
[0xa0eb18] marian::ConstantNode:: init () + 0x48
[0xa00405] marian::ExpressionGraph:: forward (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&, bool) + 0x95
[0xa021f4] marian::ExpressionGraph:: forwardNext () + 0x184
[0xbc920a] marian::GraphGroup:: collectStats (std::shared_ptr<marian::ExpressionGraph>, std::shared_ptr<marian::models::ICriterionFunction>, std::vector<std::shared_ptr<marian::Vocab>,std::allocator<std::shared_ptr<marian::Vocab>>> const&, double) + 0xe2a
[0xba51de] marian::SyncGraphGroup:: collectStats (std::vector<std::shared_ptr<marian::Vocab>,std::allocator<std::shared_ptr<marian::Vocab>>> const&) + 0x13e
[0x82674f] marian::Train<marian::SyncGraphGroup>:: run () + 0x37f
[0x74eb78] mainTrainer (int, char**) + 0xc8
[0x70a94a] main + 0x8a
[0x7fffe6ee5840] __libc_start_main + 0xf0
[0x74c7b9] _start + 0x29
It fails at CURAND_CHECK:
102 void CurandRandomGenerator::normal(Tensor tensor, float mean, float stddev) {
103 matchOrAbort<float>(tensor->type());
104
105 tensor->getBackend()->setDevice();
106 CURAND_CHECK(curandGenerateNormal(generator_, tensor->data(), tensor->size(), mean, stddev));
107 }
How to reproduce
I just did:
auto u = W->graph()->constant({1, 1}, inits::normal());
For example, inits::uniform() works fine. I'm working on my branch, but I don't think it's my code that's at fault. I'm just trying to use inits::normal().
Context
- Marian version: v1.10.19; cda55c3 2021-06-01 16:33:16 +0000
- CMake command: cmake .. -DCOMPILE_TESTS=ON -DUSE_SENTENCEPIECE=ON -DCMAKE_BUILD_TYPE=Release
- --build-info all
AVX2_FOUND=true
AVX512_FOUND=false
AVX_FOUND=true
BUILD_ARCH=native
CMAKE_AR=/usr/bin/ar
CMAKE_BUILD_TYPE=Release
CMAKE_COLOR_MAKEFILE=ON
CMAKE_CXX_COMPILER=/usr/bin/c++
CMAKE_CXX_FLAGS=-std=c++11 -pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -march=native -DUSE_SENTENCEPIECE -DCUDA_FOUND -DUSE_NCCL -DMKL_ILP64 -m64
CMAKE_CXX_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_CXX_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_CXX_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_COMPILER=/usr/bin/cc
CMAKE_C_FLAGS=-pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -march=native -DMKL_ILP64 -m64
CMAKE_C_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_C_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_EXPORT_COMPILE_COMMANDS=OFF
CMAKE_INSTALL_BINDIR=bin
CMAKE_INSTALL_DATAROOTDIR=share
CMAKE_INSTALL_INCLUDEDIR=include
CMAKE_INSTALL_LIBDIR=lib
CMAKE_INSTALL_LIBEXECDIR=libexec
CMAKE_INSTALL_LOCALSTATEDIR=var
CMAKE_INSTALL_OLDINCLUDEDIR=/usr/include
CMAKE_INSTALL_PREFIX=/usr/local
CMAKE_INSTALL_SBINDIR=sbin
CMAKE_INSTALL_SHAREDSTATEDIR=com
CMAKE_INSTALL_SYSCONFDIR=etc
CMAKE_LINKER=/usr/bin/ld
CMAKE_MAKE_PROGRAM=/usr/bin/make
CMAKE_NM=/usr/bin/nm
CMAKE_OBJCOPY=/usr/bin/objcopy
CMAKE_OBJDUMP=/usr/bin/objdump
CMAKE_RANLIB=/usr/bin/ranlib
CMAKE_SKIP_INSTALL_RPATH=NO
CMAKE_SKIP_RPATH=NO
CMAKE_STRIP=/usr/bin/strip
CMAKE_VERBOSE_MAKEFILE=FALSE
COMPILE_AVX=ON
COMPILE_AVX2=ON
COMPILE_AVX512=ON
COMPILE_CPU=ON
COMPILE_CUDA=ON
COMPILE_EXAMPLES=OFF
COMPILE_KEPLER=OFF
COMPILE_LIBRARY_ONLY=OFF
COMPILE_MAXWELL=OFF
COMPILE_PASCAL=ON
COMPILE_SERVER=OFF
COMPILE_SSE2=ON
COMPILE_SSE3=ON
COMPILE_SSE4_1=ON
COMPILE_SSE4_2=ON
COMPILE_TESTS=ON
COMPILE_TURING=ON
COMPILE_VOLTA=ON
CUDA_64_BIT_DEVICE_CODE=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE=ON
CUDA_BUILD_CUBIN=OFF
CUDA_BUILD_EMULATION=OFF
CUDA_CUDART_LIBRARY=/usr/local/cuda-10.2/lib64/libcudart.so
CUDA_CUDA_LIBRARY=/usr/lib/x86_64-linux-gnu/libcuda.so
CUDA_HOST_COMPILATION_CPP=ON
CUDA_HOST_COMPILER=/usr/bin/cc
CUDA_NVCC_EXECUTABLE=/usr/local/cuda-10.2/bin/nvcc
CUDA_NVCC_FLAGS=-DUSE_SENTENCEPIECE-DCUDA_FOUND-DUSE_NCCL--default-streamper-thread-O3-g--use_fast_math-gencode=arch=compute_60,code=sm_60-gencode=arch=compute_61,code=sm_61-arch=sm_70-gencode=arch=compute_70,code=sm_70-gencode=arch=compute_70,code=compute_70-gencode=arch=compute_75,code=sm_75-gencode=arch=compute_75,code=compute_75-ccbin/usr/bin/cc-std=c++11-Xcompiler -fPIC-Xcompiler -Wno-unused-result-Xcompiler -Wno-deprecated-Xcompiler -Wno-pragmas-Xcompiler -Wno-unused-value-Xcompiler -Werror
CUDA_PROPAGATE_HOST_FLAGS=OFF
CUDA_SDK_ROOT_DIR=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION=OFF
CUDA_TOOLKIT_INCLUDE=/usr/local/cuda-10.2/include
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.2
CUDA_TOOLKIT_TARGET_DIR=/usr/local/cuda-10.2
CUDA_USE_STATIC_CUDA_RUNTIME=ON
CUDA_VERBOSE_BUILD=OFF
CUDA_VERSION=10.2
CUDA_cublas_LIBRARY=/fs/zisa0/mbehnke/anaconda3/envs/shrink/lib/libcublas.so
CUDA_cudart_static_LIBRARY=/usr/local/cuda-10.2/lib64/libcudart_static.a
CUDA_cufft_LIBRARY=/usr/local/cuda-10.2/lib64/libcufft.so
CUDA_cupti_LIBRARY=CUDA_cupti_LIBRARY-NOTFOUND
CUDA_curand_LIBRARY=/usr/local/cuda-10.2/lib64/libcurand.so
CUDA_cusolver_LIBRARY=/usr/local/cuda-10.2/lib64/libcusolver.so
CUDA_cusparse_LIBRARY=/usr/local/cuda-10.2/lib64/libcusparse.so
CUDA_nppc_LIBRARY=/usr/local/cuda-10.2/lib64/libnppc.so
CUDA_nppi_LIBRARY=CUDA_nppi_LIBRARY-NOTFOUND
CUDA_npps_LIBRARY=/usr/local/cuda-10.2/lib64/libnpps.so
CUDA_rt_LIBRARY=/usr/lib/x86_64-linux-gnu/librt.so
DOXYGEN_DOT_EXECUTABLE=/usr/bin/dot
DOXYGEN_EXECUTABLE=/usr/bin/doxygen
GENERATE_MARIAN_INSTALL_TARGETS=OFF
GIT_EXECUTABLE=/usr/bin/git
INTEL_ROOT=/opt/intel
INTGEMM_DONT_BUILD_TESTS=ON
MKL_CORE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_core.a
MKL_INCLUDE_DIR=/opt/intel/mkl/include
MKL_INCLUDE_DIRS=/opt/intel/mkl/include
MKL_INTERFACE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a
MKL_LIBRARIES=-Wl,--start-group/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a/opt/intel/mkl/lib/intel64/libmkl_sequential.a/opt/intel/mkl/lib/intel64/libmkl_core.a-Wl,--end-group
MKL_ROOT=/opt/intel/mkl
MKL_SEQUENTIAL_LAYER_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_sequential.a
SPM_BUILD_TEST=OFF
SPM_COVERAGE=OFF
SPM_ENABLE_NFKC_COMPILE=OFF
SPM_ENABLE_SHARED=OFF
SPM_ENABLE_TCMALLOC=ON
SPM_ENABLE_TENSORFLOW_SHARED=OFF
SPM_NO_THREADLOCAL=OFF
SPM_TCMALLOC_STATIC=OFF
SPM_USE_BUILTIN_PROTOBUF=ON
SQLITE_ENABLE_ASSERT_HANDLER=OFF
SQLITE_ENABLE_COLUMN_METADATA=ON
SQLITE_USE_LEGACY_STRUCT=OFF
SSE2_FOUND=true
SSE3_FOUND=true
SSE4_1_FOUND=true
SSE4_2_FOUND=true
SSSE3_FOUND=true
TCMALLOC_LIB=/usr/lib/libtcmalloc_minimal.so
Tcmalloc_INCLUDE_DIR=/usr/include
Tcmalloc_LIBRARY=/usr/lib/libtcmalloc_minimal.so
USE_APPLE_ACCELERATE=OFF
USE_CCACHE=OFF
USE_CUDNN=OFF
USE_DOXYGEN=ON
USE_FBGEMM=OFF
USE_MKL=ON
USE_MPI=OFF
USE_NCCL=ON
USE_OPENMP=OFF
USE_SENTENCEPIECE=ON
USE_STATIC_LIBS=OFF
- Log file: I can add if necessary.
It might still be your code. Curand and CUDA errors in general tend to occur after other code has invalidated memory. If there is a chance that you are accessing GPU memory in a bad way in your own code, this might just be a symptom of that.
Can you check the same thing in master, maybe? To exclude your code as a source.
Ah, you said it also fails on the CPU. That's more suspicious. Is the error message the same?
curand wants to generate in multiples of 2. We use curand also on the CPU, when compiled with CUDA on. On CPU-only builds this works because it uses the STL random generator which doesn't require an even number.
OK. thanks. That's annoying. I will take a look what I can do.