cub
cub copied to clipboard
`cub::DeviceHistogramEven` 3/4 multichannel `float` unit test fails with ICC
This failure doesn't reproduce with GCC. Disabling for now.
[17:53:21]:wash@voyager:/home/wash/development/nvidia/cuda_linux_p4/sw/gpgpu/thrust:0:$ ci/local/build.bash -i gpuci/cccl:cuda11.3.1-devel-ubuntu20.04-icclatest cub.cpp17.test.device_histogram
cuda11.3.1-devel-ubuntu20.04-icclatest: Pulling from gpuci/cccl
Digest: sha256:e20e996de6f79a75754789746ad0e3535ddc82b20706fde67db489f56ca5cefc
Status: Image is up to date for gpuci/cccl:cuda11.3.1-devel-ubuntu20.04-icclatest
docker.io/gpuci/cccl:cuda11.3.1-devel-ubuntu20.04-icclatest
:: initializing oneAPI environment ...
build.bash: BASH_VERSION = 5.0.17(1)-release
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
>>>> Determine system topology...
Logical CPUs: 12 [threads]
Physical CPUs: 6 [cores]
Total Mem: 62.57 [GBs]
Max Threads Per Core: 2 [threads/core]
Min Memory Per Threads: 4 [GBs/thread]
CPU Bound Threads: 12 [threads]
Mem Bound Threads: 15 [threads]
Parallel Level: 12 [threads]
Mem Per Thread: 5.214 [GBs/thread]
>>>> Get environment...
TBBROOT=/opt/intel/oneapi/tbb/2021.2.0/env/..
NVIDIA_VISIBLE_DEVICES=all
TOTAL_MEM=62.57
ONEAPI_ROOT=/opt/intel/oneapi
SETVARS_VARS_PATH=/opt/intel/oneapi/tbb/latest/env/vars.sh
HOSTNAME=4986c3a79899
ACL_BOARD_VENDOR_PATH=/opt/Intel/OpenCLFPGA/oneAPI/Boards
NVIDIA_REQUIRE_CUDA=cuda>=11.3 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 driver>=450
COVERAGE_PLAN=Minimal
APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1
SDK_TYPE=cuda
NCCL_VERSION=2.9.9
CMAKE_BUILD_TYPE=Release
PWD=/cccl/thrust/build
NVIDIA_DRIVER_CAPABILITIES=compute,utility
LOGICAL_CPUS=12
MANPATH=/opt/intel/oneapi/debugger/10.1.1/documentation/man::/opt/intel/oneapi/compiler/2021.2.0/documentation/en/man/common:
MIN_MEMORY_PER_THREAD=4
CXX=/opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icpc
CPU_BOUND_THREADS=12
TZ=US/Pacific
HOME=/cccl/thrust
MEM_BOUND_THREADS=15
CUDA_VERSION=11.3.1
SETVARS_COMPLETED=1
CMAKE_PREFIX_PATH=/opt/intel/oneapi/tbb/2021.2.0/env/..:
CUDACXX=/usr/local/cuda/bin/nvcc
SDK_VER=11.3.1-devel
WORKSPACE=/cccl/thrust
INFOPATH=/opt/intel/oneapi/debugger/10.1.1/documentation/info/
TERM=xterm
LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.2.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.2.0/linux/lib:/usr/local/cuda/lib64/stubs
CMAKE_FLAGS=-DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER='/usr/local/cuda/bin/nvcc' -DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler -DCMAKE_CXX_COMPILER='/opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icpc' -G Ninja -DTHRUST_ENABLE_MULTICONFIG=ON -DTHRUST_MULTICONFIG_ENABLE_DIALECT_LATEST=ON -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_CPP=ON -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_TBB=OFF -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_OMP=OFF -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_CUDA=ON -DTHRUST_MULTICONFIG_WORKLOAD=SMALL -DTHRUST_INCLUDE_CUB_CMAKE=ON -DCUB_ENABLE_THOROUGH_TESTING=OFF -DCUB_ENABLE_BENCHMARK_TESTING=OFF -DCUB_ENABLE_MINIMAL_TESTING=ON -DTHRUST_AUTO_DETECT_COMPUTE_ARCHS=ON
SHLVL=2
BUILD_TYPE=gpu
OCL_ICD_FILENAMES=libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/x64/libintelocl.so
PARALLEL_LEVEL=12
MEM_PER_THREAD=5.214
OS_TYPE=ubuntu
INTELFPGAOCLSDKROOT=/opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga
LD_LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.2.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/debugger/10.1.1/dep/lib:/opt/intel/oneapi/debugger/10.1.1/libipt/intel64/lib:/opt/intel/oneapi/debugger/10.1.1/gdb/intel64/lib:/opt/intel/oneapi/compiler/2021.2.0/linux/lib:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/x64:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/emu:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib:/opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
OS_VER=20.04
CMAKE_BUILD_FLAGS=-- -k0 cub.cpp17.test.device_histogram
MAX_THREADS_PER_CORE=2
PATH=/usr/local/cuda/bin:/opt/intel/oneapi/dev-utilities/2021.2.0/bin:/opt/intel/oneapi/debugger/10.1.1/gdb/intel64/bin:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/llvm/aocl-bin:/opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64:/opt/intel/oneapi/compiler/2021.2.0/linux/bin:/opt/intel/oneapi/compiler/2021.2.0/linux/ioc/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CC=/opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icc
INTEL_PYTHONHOME=/opt/intel/oneapi/debugger/10.1.1/dep
CTEST_FLAGS=--output-on-failure -R ^cub.cpp17.test.device_histogram$
CPATH=/opt/intel/oneapi/tbb/2021.2.0/env/../include:/opt/intel/oneapi/dev-utilities/2021.2.0/include:/opt/intel/oneapi/compiler/2021.2.0/linux/include
DEBIAN_FRONTEND=noninteractive
CXX_TYPE=icc
PHYSICAL_CPUS=6
OLDPWD=/cccl/thrust
CXX_VER=latest
CMAKE_LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.2.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.2.0/linux/lib:/usr/local/cuda/lib64/stubs
_=/usr/bin/env
>>>> Check versions...
icpc (ICC) 2021.2.0 20210228
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
Tue Jun 29 17:53:44 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GT 710 On | 00000000:04:00.0 N/A | N/A |
| 40% 41C P8 N/A / N/A | 1MiB / 2002MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 RTX A6000 On | 00000000:17:00.0 Off | Off |
| 30% 48C P8 31W / 300W | 1MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro GV100 On | 00000000:65:00.0 On | Off |
| 33% 46C P0 27W / 250W | 0MiB / 32505MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
>>>> Configure Thrust and CUB...
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER='/usr/local/cuda/bin/nvcc' -DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler -DCMAKE_CXX_COMPILER='/opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icpc' -G Ninja -DTHRUST_ENABLE_MULTICONFIG=ON -DTHRUST_MULTICONFIG_ENABLE_DIALECT_LATEST=ON -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_CPP=ON -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_TBB=OFF -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_OMP=OFF -DTHRUST_MULTICONFIG_ENABLE_SYSTEM_CUDA=ON -DTHRUST_MULTICONFIG_WORKLOAD=SMALL -DTHRUST_INCLUDE_CUB_CMAKE=ON -DCUB_ENABLE_THOROUGH_TESTING=OFF -DCUB_ENABLE_BENCHMARK_TESTING=OFF -DCUB_ENABLE_MINIMAL_TESTING=ON -DTHRUST_AUTO_DETECT_COMPUTE_ARCHS=ON
-- The CXX compiler identification is Intel 20.2.2.20210228
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icpc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUB: /cccl/thrust/dependencies/cub/cub/cmake/cub-config.cmake (found version "1.14.0.0")
-- Found Thrust: /cccl/thrust/thrust/cmake/thrust-config.cmake (found version "1.14.0.0")
-- Performing Test CXX_FLAG__Werror
-- Performing Test CXX_FLAG__Werror - Success
-- Performing Test CXX_FLAG__Wall
-- Performing Test CXX_FLAG__Wall - Success
-- Performing Test CXX_FLAG__Wextra
-- Performing Test CXX_FLAG__Wextra - Success
-- Performing Test CXX_FLAG__Winit_self
-- Performing Test CXX_FLAG__Winit_self - Success
-- Performing Test CXX_FLAG__Woverloaded_virtual
-- Performing Test CXX_FLAG__Woverloaded_virtual - Success
-- Performing Test CXX_FLAG__Wcast_qual
-- Performing Test CXX_FLAG__Wcast_qual - Success
-- Performing Test CXX_FLAG__Wpointer_arith
-- Performing Test CXX_FLAG__Wpointer_arith - Success
-- Performing Test CXX_FLAG__Wunused_local_typedef
-- Performing Test CXX_FLAG__Wunused_local_typedef - Failed
-- Performing Test CXX_FLAG__Wvla
-- Performing Test CXX_FLAG__Wvla - Success
-- Performing Test CXX_FLAG__Wgnu
-- Performing Test CXX_FLAG__Wgnu - Failed
-- Performing Test CXX_FLAG__Wno_gnu_zero_variadic_macro_arguments
-- Performing Test CXX_FLAG__Wno_gnu_zero_variadic_macro_arguments - Failed
-- Performing Test CXX_FLAG__Wno_unused_function
-- Performing Test CXX_FLAG__Wno_unused_function - Success
-- Performing Test CXX_FLAG__diag_disable_11074
-- Performing Test CXX_FLAG__diag_disable_11074 - Success
-- Performing Test CXX_FLAG__diag_disable_11076
-- Performing Test CXX_FLAG__diag_disable_11076 - Success
-- The CUDA compiler identification is NVIDIA 11.3.109
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Thrust: Automatically detected compute architectures: sm_35 sm_70 sm_86
-- Thrust: Explicitly enabled compute architectures: sm_35 sm_70 sm_86
-- Testing for supported language standards...
-- Testing CXX11 Support: TRUE
-- Testing CXX14 Support: TRUE
-- Testing CXX17 Support: TRUE
-- Testing CUDA11 Support: TRUE
-- Testing CUDA14 Support: TRUE
-- Testing CUDA17 Support: TRUE
-- Enabling Thrust configuration: cpp.cuda.cpp17
-- 1 unique thrust.host.device.dialect configurations generated
-- CPP system found? TRUE
-- CUDA system found? TRUE
-- TBB system found? FALSE
-- OMP system found? FALSE
-- CUB: Explicitly enabled compute architectures: sm_35 sm_70 sm_86
-- Performing Test CXX_FLAG__Wno_deprecated_declarations
-- Performing Test CXX_FLAG__Wno_deprecated_declarations - Success
-- Found Thrust: /cccl/thrust/thrust/cmake/thrust-config.cmake (found suitable exact version "1.14.0.0")
-- Enabling CUB configuration: cpp17
-- 1 unique cub.dialect configurations generated
-- Configuring done
-- Generating done
-- Build files have been written to: /cccl/thrust/build
Configure Time: 0m7.170s
>>>> Build Thrust and CUB...
cmake --build . -- -k0 cub.cpp17.test.device_histogram -j 12
[0/2] Re-checking globbed directories...
ninja: no work to do.
Build Time: 0m0.369s
>>>> Test Thrust and CUB...
ctest --output-on-failure -R ^cub.cpp17.test.device_histogram$
Test project /cccl/thrust/build
Start 298: cub.cpp17.test.device_histogram
Lots of CUB test spam nonsense omitted
CUB cub::DeviceHistogramEven (pointer) 2073600 pixels (1080 height, 1920 width, 30720-byte row stride), 8294400 4-byte f samples (entropy reduction 0), i counters, 3/4 channels, max sample 1
Channel 0: 256 bins [0, 1)
Channel 1: 128 bins [0.25, 0.75)
Channel 2: 64 bins [0.375, 0.625)
Invoking DeviceHistogramInitKernel<<<1, 256, 0, 0>>>()
Invoking histogram_sweep_kernel<<<{336, 1, 1}, 384, 0, 0>>>(), 5 pixels per thread, 4 SM occupancy
INCORRECT: [39]: 8139 != 8140 Channel 0 FAILINCORRECT: [108]: 8038 != 8039 Channel 1 FAIL Channel 2 PASS
(../dependencies/cub/test/test_device_histogram.cu: 719)
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 406.30 sec
The following tests FAILED:
298 - cub.cpp17.test.device_histogram (Failed)
Errors while running CTest
Test Time: 6m46.309s