csv-parser
csv-parser copied to clipboard
AddressSanitizer: heap-use-after-free on large dataset
We have tried to use the csv-parser on a large dataset (8 million lines at 9,9 GB). However when looping over all lines and exectue row[column_name].get<std::string>()
we get the following error message
`
==245==ERROR: AddressSanitizer: heap-use-after-free on address 0x621003c37248 at pc 0x56492659e7ee bp 0x7ffe476e2f20 sp 0x7ffe476e2f10
READ of size 8 at 0x621003c37248 thread T0
#0 0x56492659e7ed in csv::internals::CSVFieldList::operator[](unsigned long) const /mwe/includes/csv_reader.h:7635
#1 0x56492659f298 in csv::CSVRow::get_field(unsigned long) const /mwe/includes/csv_reader.h:7694
#2 0x56492659ea9d in csv::CSVRow::operator[](unsigned long) const /mwe/includes/csv_reader.h:7656
#3 0x56492659ebea in csv::CSVRow::operator[](std::__cxx11::basic_string<char, std::char_traits
0x621003c37248 is located 328 bytes inside of 4096-byte region [0x621003c37100,0x621003c38100)
freed by thread T107 here:
#0 0x7f66a3cb722f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
#1 0x5649265be954 in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145
#2 0x5649265b31d6 in std::allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/allocator.h:199
#3 0x5649265b31d6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::deallocate(std::allocatorcsv::internals::RawCSVField*&, csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496
#4 0x5649265aa73f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/stl_vector.h:354
#5 0x5649265b0692 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:500
#6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198
#7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640
#8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478
#9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972
#10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999
#11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits
previously allocated by thread T107 here:
#0 0x7f66a3cb61c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
#1 0x5649265c49fd in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
#2 0x5649265bd7a6 in std::allocatorcsv::internals::RawCSVField*::allocate(unsigned long) /usr/include/c++/11/bits/allocator.h:185
#3 0x5649265bd7a6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::allocate(std::allocatorcsv::internals::RawCSVField*&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
#4 0x5649265b779f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
#5 0x5649265b0514 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:440
#6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198
#7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640
#8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478
#9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972
#10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999
#11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits
Thread T107 created by T0 here:
#0 0x7f66a3c58685 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x7f66a3ab2388 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc388)
#2 0x56492659d23d in csv::CSVReader::read_row(csv::CSVRow&) /mwe/includes/csv_reader.h:7536
#3 0x56492659e70a in csv::CSVReader::iterator::operator++() /mwe/includes/csv_reader.h:7605
#4 0x5649265928ad in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits
SUMMARY: AddressSanitizer: heap-use-after-free /mwe/includes/csv_reader.h:7635 in csv::internals::CSVFieldList::operator[](unsigned long) const Shadow bytes around the buggy address: 0x0c428077edf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c428077ee40: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd 0x0c428077ee50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==245==ABORTING `
The problem can be fixed, when using std::this_thread::sleep_for(std::chrono::nanoseconds(1));
in the same loop.
For reproduceability, I have put a MWE here: https://drive.google.com/file/d/1M_PJLlhxs8JTmIGEcDNCBAeBqxqmdNBC/view?usp=drive_link
Just extract it and run docker build . --tag=mwe
, then docker run -it mwe
and inside the container ./runAndBuild.sh
.
Thanks for your report, I'll take a look
Should be fixed in the latest release