cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[BUG] memcheck and racecheck errors in avro reader with `codec="deflate"`

Open wence- opened this issue 1 year ago • 1 comments

Describe the bug

import cudf
import fastavro
import io

total_rows = num_rows = rows_per_block = 2048
total_bytes_per_block = rows_per_block * 7

records = [{"0": f"{i:0>6}"} for i in range(total_rows)]
schema = {
    "name": "root",
    "type": "record",
    "fields": [
        {"name": "0", "type": "string"},
    ],
}

buffer = io.BytesIO()
fastavro.writer(buffer, schema, records, sync_interval=total_bytes_per_block, codec="deflate")
buffer.seek(0)

actual_df = cudf.read_avro(buffer, skiprows=0, num_rows=num_rows)

Extracted from test_avro_reader_fastavro_integration.py::test_avro_reader_multiblock.

Neither

compute-sanitizer --tool=memcheck python bug.py

nor

compute-sanitizer --tool=racecheck python bug.py

are clean.

Exemplar stack traces:

memcheck
========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 1 bytes
=========     at 0x2080 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:807:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0x7f6078604cb3 is out of bounds
=========     and is 2,356 bytes after the nearest allocation at 0x7f6078601600 of size 11,648 bytes
=========     Device Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1109:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [0x6050]
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x332470]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x14fb4]
=========                in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x70aae]
=========                in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
=========     Host Frame:/home/coder/.conda/envs/rapids/targets/x86_64-linux/include/cuda_runtime.h:216:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0x12a5605]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/tmp/tmpxft_0003da43_00000000-6_gpuinflate.compute_90.cudafe1.stub.c:1:__device_stub__ZN4cudf2io14inflate_kernelILi128EEEvNS_11device_spanIKNS2_IKhLm18446744073709551615EEELm18446744073709551615EEENS2_IKNS2_IhLm18446744073709551615EEELm18446744073709551615EEENS2_INS0_18compression_resultELm18446744073709551615EEENS0_20gzip_header_includedE(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>&, cudf::io::gzip_header_included) [0x12a4de6]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/tmp/tmpxft_0003da43_00000000-6_gpuinflate.compute_90.cudafe1.stub.c:4:void cudf::io::__wrapper__device_stub_inflate_kernel<128>(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>&, cudf::io::gzip_header_included&) [0x12a4e1e]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1145:void cudf::io::inflate_kernel<128>(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>, cudf::io::gzip_header_included) [0x12a5598]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1214:cudf::io::gpuinflate(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>, cudf::io::gzip_header_included, rmm::cuda_stream_view) [0x12a49ef]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/avro/reader_impl.cu:227:cudf::io::detail::avro::decompress_data(cudf::io::datasource&, cudf::io::detail::avro::metadata&, rmm::device_buffer const&, rmm::cuda_stream_view) [0x123db3c]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/avro/reader_impl.cu:528:cudf::io::detail::avro::read_avro(std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >&&, cudf::io::avro_reader_options const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*) [0x123fa1f]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:cudf::io::read_avro(cudf::io::avro_reader_options const&, rmm::mr::device_memory_resource*) [0x13019ee]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame: [0x2ba3c]
=========                in /home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/_lib/avro.cpython-310-x86_64-linux-gnu.so
=========     Host Frame: [0x2d29f]
=========                in /home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/_lib/avro.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:4181:_PyEval_EvalFrameDefault [0x139022]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Objects/call.c:342:_PyFunction_Vectorcall [0x1448cc]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:_PyEval_EvalFrameDefault [0x1357dc]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:_PyEval_Vector [0x1d7870]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:1135:PyEval_EvalCode [0x1d77b7]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1292:run_eval_code_obj [0x207d1a]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1313:run_mod [0x203123]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1208:pyrun_file.cold [0x9a4d1]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:456:_PyRun_SimpleFileObject [0x1fd60e]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:90:_PyRun_AnyFileObject [0x1fd1a4]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Modules/main.c:670:Py_RunMain [0x1fa39b]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Modules/main.c:1091:Py_BytesMain [0x1cae17]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame: [0x29d90]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x29e40]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame: [0x1cad11]
=========                in /home/coder/.conda/envs/rapids/bin/python
========= 
racecheck
========= COMPUTE-SANITIZER
========= Error: Race reported between Read access at 0xe00 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:789:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x1930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:543:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16132 hazards]
=========     and Write access at 0x5660 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:661:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16156 hazards]
========= 
========= Error: Race reported between Write access at 0xd90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:957:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0x33c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:590:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1144 hazards]
=========     and Read access at 0x5250 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:642:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [6592 hazards]
========= 
========= Error: Race reported between Read access at 0x810 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:954:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Write access at 0x59c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:665:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1032 hazards]
========= 
========= Error: Race reported between Read access at 0xa70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:784:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x5930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:663:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1028 hazards]
=========     and Write access at 0x5f90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:671:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0x11c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:793:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Read access at 0xf90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:523:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [500 hazards]
=========     and Read access at 0x5dd0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:670:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0xf60 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:962:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0xdb0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:522:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [272 hazards]
========= 
========= Error: Race reported between Write access at 0x5d70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1104:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Read access at 0x5d0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:951:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x3b0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:775:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x3000 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:826:cudf::io::process_symbols(cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x31a0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1068:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Write access at 0x4900 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1081:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [4 hazards]
========= 
========= Error: Race reported between Read access at 0xe00 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:789:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x1930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:543:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16132 hazards]
=========     and Write access at 0x5660 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:661:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16156 hazards]
========= 
========= Error: Race reported between Write access at 0xd90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:957:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0x33c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:590:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1144 hazards]
=========     and Read access at 0x5250 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:642:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [6592 hazards]
========= 
========= Error: Race reported between Read access at 0x810 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:954:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Write access at 0x59c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:665:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1032 hazards]
========= 
========= Error: Race reported between Read access at 0xa70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:784:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x5930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:663:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1028 hazards]
=========     and Write access at 0x5f90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:671:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0x11c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:793:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Read access at 0xf90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:523:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [500 hazards]
=========     and Read access at 0x5dd0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:670:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0xf60 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:962:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0xdb0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:522:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [272 hazards]
========= 
========= Error: Race reported between Write access at 0x5d70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1104:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Read access at 0x5d0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:951:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x3b0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:775:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x3000 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:826:cudf::io::process_symbols(cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x31a0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1068:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Write access at 0x4900 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1081:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [4 hazards]
========= 
========= RACECHECK SUMMARY: 18 hazards displayed (14 errors, 4 warnings)

I do not know if the racecheck warnings are as problematic as the memcheck ones, gpuinflate.cu is littered with volatile accesses to the inter-warp communication queue without (AFAICT) any synchronisation, but perhaps there are enough spin-waits that it is "OK"?

wence- avatar Mar 04 '24 11:03 wence-

I can't reproduce the memcheck error right now. I do see the same racecheck issues though.

vyasr avatar May 17 '24 20:05 vyasr

Inclined to close this since the memcheck errors are gone I believe the racecheck warnings to be not an issue.

davidwendt avatar Dec 16 '24 16:12 davidwendt