llvm-project
llvm-project copied to clipboard
clang 14/15 libc++: std::vector memory leak in case of iterator exception
The program below shows this memory leak when executed in Valgrind:
[user@host]$ valgrind --leak-check=full ./LeakTest
==725012== Memcheck, a memory error detector
==725012== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==725012== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==725012== Command: ./LeakTest
==725012==
Exceeded limit!
==725012==
==725012== HEAP SUMMARY:
==725012== in use at exit: 40 bytes in 2 blocks
==725012== total heap usage: 6 allocs, 4 frees, 1,272 bytes allocated
==725012==
==725012== 24 bytes in 1 blocks are definitely lost in loss record 2 of 2
==725012== at 0x4C3761F: operator new(unsigned long) (vg_replace_malloc.c:432)
==725012== by 0x10CF74: void* std::__1::__libcpp_operator_new[abi:v15002]<unsigned long>(unsigned long) (clang15el8/bin/../include/c++/v1/new:246)
==725012== by 0x10CEFC: std::__1::__libcpp_allocate[abi:v15002](unsigned long, unsigned long) (clang15el8/bin/../include/c++/v1/new:272)
==725012== by 0x10CE67: std::__1::allocator<unsigned int>::allocate[abi:v15002](unsigned long) (clang15el8/bin/../include/c++/v1/__memory/allocator.h:112)
==725012== by 0x10CACC: std::__1::__allocation_result<std::__1::allocator_traits<std::__1::allocator<unsigned int> >::pointer> std::__1::__allocate_at_least[abi:v15002]<std::__1::allocator<unsigned int> >(std::__1::allocator<unsigned int>&, unsigned long) (clang15el8/bin/../include/c++/v1/__memory/allocate_at_least.h:54)
==725012== by 0x10C852: std::__1::vector<unsigned int, std::__1::allocator<unsigned int> >::__vallocate[abi:v15002](unsigned long) (clang15el8/bin/../include/c++/v1/vector:682)
==725012== by 0x10C5E7: std::__1::vector<unsigned int, std::__1::allocator<unsigned int> >::vector<MyIterator>(MyIterator, std::__1::enable_if<__is_cpp17_forward_iterator<MyIterator>::value&&is_constructible<unsigned int, std::__1::iterator_traits<MyIterator>::reference>::value, MyIterator>::type) (clang15el8/bin/../include/c++/v1/vector:1157)
==725012== by 0x10C23C: main (main.cpp:57)
==725012==
==725012== LEAK SUMMARY:
==725012== definitely lost: 24 bytes in 1 blocks
==725012== indirectly lost: 0 bytes in 0 blocks
==725012== possibly lost: 0 bytes in 0 blocks
==725012== still reachable: 16 bytes in 1 blocks
==725012== suppressed: 0 bytes in 0 blocks
==725012== Reachable blocks (those to which a pointer was found) are not shown.
==725012== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==725012==
==725012== For lists of detected and suppressed errors, rerun with: -s
==725012== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
The program defines an iterator that at some point throws an exception in operator*()
. Such behavior is common e.g. for Unicode iterators (UTF-8 and UTF-16) if they encounter illegal byte sequences (e.g. missing parts of surrogate pairs or incorrect leading/continuation bytes). It can be triggered with Boost.Regex when combined with the ICU library.
The iterator constructor of std::vector is used to actually trigger the memory leak.
Combinations where the leak occurs:
- clang 15.0.2 with libc++ (clang++ -Wall -Wextra -std=c++20 -stdlib=libc++ main.cpp -o LeakTest)
- clang 14.0.6 with libc++
Combinations where no leak occurs:
- clang 11.0.1 with libc++
- clang 14.0.6 with libstdc++ (clang++ -Wall -Wextra -std=c++20 -stdlib=libstdc++ main.cpp -o LeakTest)
- gcc 12.2 with libstdc++ (g++ -Wall -Wextra -std=c++20 main.cpp -o LeakTest)
This looks like a regression in recent versions of libc++.
The program to reproduce the issue:
#include <iostream>
#include <vector>
class MyIterator
{
public:
using iterator_category = std::forward_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = unsigned int;
using pointer = value_type*;
using reference = value_type&;
using const_reference = const value_type&;
using ContainerType = std::vector<value_type>;
using ContainerTypeConstIterator = typename ContainerType::const_iterator;
const_reference operator*() const
{
static unsigned int s_counter = 0;
if (++s_counter == 2U)
throw std::runtime_error("Exceeded limit!");
return *m_itValue;
}
bool operator==(const MyIterator&) const = default;
MyIterator& operator++()
{
++m_itValue;
return *this;
}
static MyIterator begin(const ContainerType& p_values)
{
return MyIterator(p_values.begin(), p_values.end());
}
static MyIterator end(const ContainerType& p_values)
{
return MyIterator(p_values.end(), p_values.end());
}
private:
MyIterator(const ContainerTypeConstIterator& p_itValue, const ContainerTypeConstIterator& p_itEnd) : m_itValue(p_itValue), m_itEnd(p_itEnd)
{
}
ContainerTypeConstIterator m_itValue;
const ContainerTypeConstIterator m_itEnd;
};
int main()
{
try
{
const std::vector<unsigned int> values{ 0, 1, 2, 3, 4, 5 };
const std::vector<unsigned int> copy(MyIterator::begin(values), MyIterator::end(values));
}
catch (const std::runtime_error& e) {
std::cout << e.what() << std::endl;
}
return 0;
}
CC: @philnik777
Did some more testing. The problem does not occur with clang 12.0.1 and 13.0.1 together with libc++, so it looks like a regression starting with clang 14. It is however independent of the C++ Standard used. It also occurs when using C++17 or C++14 (if operator==() is properly implemented of course).
Here is a sample program that uses Boost.Regex in combination with ICU:
#include <boost/regex/icu.hpp>
int main()
{
try
{
// clang 15 leaks 12 bytes of memory
boost::make_u32regex("^\xed\xa0\x80$"); // single lowest lead surrogate U+D800
}
catch (...) {
}
return EXIT_SUCCESS;
}
It leaks 12 bytes when run. The given string is the UTF-8 encoding of a single lead surrogate which is illegal because it is only one half of a character. Internally operator*()
throws in Boost.Regex.
@llvm/issue-subscribers-clang-codegen
This is actually a libc++ bug. We removed the base class in LLVM14, which caused this bug. I'll look into how to fix the bug in a nice way. Unfortunately we don't have delegating constructors yet in C++03. @var-const do you have interest in adding that to C++03? You mentioned something in a patch. Pinging @ldionne for awareness.
Any idea when a fix for this issue could be available?
I assume there is no chance for a backport to clang 15?
Sorry, I had something else to do and then forgot about it. I've got https://reviews.llvm.org/D138601 to fix the issue now. I hope we can get it into LLVM15, since it's a pretty serious issue.
Thank you for the fix and the backport to clang 15.0.7!
Works like a charm.