csv-parser icon indicating copy to clipboard operation
csv-parser copied to clipboard

AddressSanitizer: heap-use-after-free on large dataset

Open JonasKellerer opened this issue 1 year ago • 1 comments

We have tried to use the csv-parser on a large dataset (8 million lines at 9,9 GB). However when looping over all lines and exectue row[column_name].get<std::string>() we get the following error message

` ==245==ERROR: AddressSanitizer: heap-use-after-free on address 0x621003c37248 at pc 0x56492659e7ee bp 0x7ffe476e2f20 sp 0x7ffe476e2f10 READ of size 8 at 0x621003c37248 thread T0 #0 0x56492659e7ed in csv::internals::CSVFieldList::operator[](unsigned long) const /mwe/includes/csv_reader.h:7635 #1 0x56492659f298 in csv::CSVRow::get_field(unsigned long) const /mwe/includes/csv_reader.h:7694 #2 0x56492659ea9d in csv::CSVRow::operator[](unsigned long) const /mwe/includes/csv_reader.h:7656 #3 0x56492659ebea in csv::CSVRow::operator[](std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const /mwe/includes/csv_reader.h:7672 #4 0x5649265927c2 in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /mwe/src/main.cpp:27 #5 0x564926592eb0 in main /mwe/src/main.cpp:36 #6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) #7 0x7f66a36d0e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) #8 0x564926591dc4 in _start (/mwe/build/csvMWE+0x7dc4)

0x621003c37248 is located 328 bytes inside of 4096-byte region [0x621003c37100,0x621003c38100) freed by thread T107 here: #0 0x7f66a3cb722f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x5649265be954 in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145 #2 0x5649265b31d6 in std::allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/allocator.h:199 #3 0x5649265b31d6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::deallocate(std::allocatorcsv::internals::RawCSVField*&, csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496 #4 0x5649265aa73f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/stl_vector.h:354 #5 0x5649265b0692 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:500 #6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198 #7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640 #8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478 #9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972 #10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999 #11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits > >::next(unsigned long) /mwe/includes/csv_reader.h:6175 #12 0x56492659ceb9 in csv::CSVReader::read_csv(unsigned long) /mwe/includes/csv_reader.h:7496 #13 0x5649265c9335 in bool std::__invoke_impl<bool, bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(std::__invoke_memfun_deref, bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:74 #14 0x5649265c913e in std::__invoke_result<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>::type std::__invoke<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:96 #15 0x5649265c905e in bool std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/include/c++/11/bits/std_thread.h:253 #16 0x5649265c8ec1 in std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::operator()() /usr/include/c++/11/bits/std_thread.h:260 #17 0x5649265c8dbd in std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> > >::_M_run() /usr/include/c++/11/bits/std_thread.h:211 #18 0x7f66a3ab22b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

previously allocated by thread T107 here: #0 0x7f66a3cb61c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5649265c49fd in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127 #2 0x5649265bd7a6 in std::allocatorcsv::internals::RawCSVField*::allocate(unsigned long) /usr/include/c++/11/bits/allocator.h:185 #3 0x5649265bd7a6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::allocate(std::allocatorcsv::internals::RawCSVField*&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464 #4 0x5649265b779f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346 #5 0x5649265b0514 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:440 #6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198 #7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640 #8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478 #9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972 #10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999 #11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits > >::next(unsigned long) /mwe/includes/csv_reader.h:6175 #12 0x56492659ceb9 in csv::CSVReader::read_csv(unsigned long) /mwe/includes/csv_reader.h:7496 #13 0x5649265c9335 in bool std::__invoke_impl<bool, bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(std::__invoke_memfun_deref, bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:74 #14 0x5649265c913e in std::__invoke_result<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>::type std::__invoke<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:96 #15 0x5649265c905e in bool std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/include/c++/11/bits/std_thread.h:253 #16 0x5649265c8ec1 in std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::operator()() /usr/include/c++/11/bits/std_thread.h:260 #17 0x5649265c8dbd in std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> > >::_M_run() /usr/include/c++/11/bits/std_thread.h:211 #18 0x7f66a3ab22b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

Thread T107 created by T0 here: #0 0x7f66a3c58685 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x7f66a3ab2388 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc388) #2 0x56492659d23d in csv::CSVReader::read_row(csv::CSVRow&) /mwe/includes/csv_reader.h:7536 #3 0x56492659e70a in csv::CSVReader::iterator::operator++() /mwe/includes/csv_reader.h:7605 #4 0x5649265928ad in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /mwe/src/main.cpp:25 #5 0x564926592eb0 in main /mwe/src/main.cpp:36 #6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)

SUMMARY: AddressSanitizer: heap-use-after-free /mwe/includes/csv_reader.h:7635 in csv::internals::CSVFieldList::operator[](unsigned long) const Shadow bytes around the buggy address: 0x0c428077edf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c428077ee40: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd 0x0c428077ee50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==245==ABORTING `

The problem can be fixed, when using std::this_thread::sleep_for(std::chrono::nanoseconds(1)); in the same loop.

For reproduceability, I have put a MWE here: https://drive.google.com/file/d/1M_PJLlhxs8JTmIGEcDNCBAeBqxqmdNBC/view?usp=drive_link

Just extract it and run docker build . --tag=mwe, then docker run -it mwe and inside the container ./runAndBuild.sh.

JonasKellerer avatar Jul 14 '23 12:07 JonasKellerer

Thanks for your report, I'll take a look

vincentlaucsb avatar Mar 30 '24 20:03 vincentlaucsb

Should be fixed in the latest release

vincentlaucsb avatar Jun 15 '24 20:06 vincentlaucsb