llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

clang 14/15 libc++: std::vector memory leak in case of iterator exception

Open pkl97 opened this issue 2 years ago • 1 comments

The program below shows this memory leak when executed in Valgrind:

[user@host]$ valgrind --leak-check=full ./LeakTest
==725012== Memcheck, a memory error detector
==725012== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==725012== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==725012== Command: ./LeakTest
==725012==
Exceeded limit!
==725012==
==725012== HEAP SUMMARY:
==725012==     in use at exit: 40 bytes in 2 blocks
==725012==   total heap usage: 6 allocs, 4 frees, 1,272 bytes allocated
==725012==
==725012== 24 bytes in 1 blocks are definitely lost in loss record 2 of 2
==725012==    at 0x4C3761F: operator new(unsigned long) (vg_replace_malloc.c:432)
==725012==    by 0x10CF74: void* std::__1::__libcpp_operator_new[abi:v15002]<unsigned long>(unsigned long) (clang15el8/bin/../include/c++/v1/new:246)
==725012==    by 0x10CEFC: std::__1::__libcpp_allocate[abi:v15002](unsigned long, unsigned long) (clang15el8/bin/../include/c++/v1/new:272)
==725012==    by 0x10CE67: std::__1::allocator<unsigned int>::allocate[abi:v15002](unsigned long) (clang15el8/bin/../include/c++/v1/__memory/allocator.h:112)
==725012==    by 0x10CACC: std::__1::__allocation_result<std::__1::allocator_traits<std::__1::allocator<unsigned int> >::pointer> std::__1::__allocate_at_least[abi:v15002]<std::__1::allocator<unsigned int> >(std::__1::allocator<unsigned int>&, unsigned long) (clang15el8/bin/../include/c++/v1/__memory/allocate_at_least.h:54)
==725012==    by 0x10C852: std::__1::vector<unsigned int, std::__1::allocator<unsigned int> >::__vallocate[abi:v15002](unsigned long) (clang15el8/bin/../include/c++/v1/vector:682)
==725012==    by 0x10C5E7: std::__1::vector<unsigned int, std::__1::allocator<unsigned int> >::vector<MyIterator>(MyIterator, std::__1::enable_if<__is_cpp17_forward_iterator<MyIterator>::value&&is_constructible<unsigned int, std::__1::iterator_traits<MyIterator>::reference>::value, MyIterator>::type) (clang15el8/bin/../include/c++/v1/vector:1157)
==725012==    by 0x10C23C: main (main.cpp:57)
==725012==
==725012== LEAK SUMMARY:
==725012==    definitely lost: 24 bytes in 1 blocks
==725012==    indirectly lost: 0 bytes in 0 blocks
==725012==      possibly lost: 0 bytes in 0 blocks
==725012==    still reachable: 16 bytes in 1 blocks
==725012==         suppressed: 0 bytes in 0 blocks
==725012== Reachable blocks (those to which a pointer was found) are not shown.
==725012== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==725012==
==725012== For lists of detected and suppressed errors, rerun with: -s
==725012== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The program defines an iterator that at some point throws an exception in operator*(). Such behavior is common e.g. for Unicode iterators (UTF-8 and UTF-16) if they encounter illegal byte sequences (e.g. missing parts of surrogate pairs or incorrect leading/continuation bytes). It can be triggered with Boost.Regex when combined with the ICU library. The iterator constructor of std::vector is used to actually trigger the memory leak.

Combinations where the leak occurs:

  • clang 15.0.2 with libc++ (clang++ -Wall -Wextra -std=c++20 -stdlib=libc++ main.cpp -o LeakTest)
  • clang 14.0.6 with libc++

Combinations where no leak occurs:

  • clang 11.0.1 with libc++
  • clang 14.0.6 with libstdc++ (clang++ -Wall -Wextra -std=c++20 -stdlib=libstdc++ main.cpp -o LeakTest)
  • gcc 12.2 with libstdc++ (g++ -Wall -Wextra -std=c++20 main.cpp -o LeakTest)

This looks like a regression in recent versions of libc++.

The program to reproduce the issue:

#include <iostream>
#include <vector>

class MyIterator
{
public:
    using iterator_category = std::forward_iterator_tag;
    using difference_type = std::ptrdiff_t;
    using value_type = unsigned int;
    using pointer = value_type*;
    using reference = value_type&;
    using const_reference = const value_type&;

    using ContainerType = std::vector<value_type>;
    using ContainerTypeConstIterator = typename ContainerType::const_iterator;

    const_reference operator*() const
    {
        static unsigned int s_counter = 0;
        if (++s_counter == 2U)
            throw std::runtime_error("Exceeded limit!");
        return *m_itValue;
    }

    bool operator==(const MyIterator&) const = default;

    MyIterator& operator++()
    {
        ++m_itValue;
        return *this;
    }

    static MyIterator begin(const ContainerType& p_values)
    {
        return MyIterator(p_values.begin(), p_values.end());
    }

    static MyIterator end(const ContainerType& p_values)
    {
        return MyIterator(p_values.end(), p_values.end());
    }

private:
    MyIterator(const ContainerTypeConstIterator& p_itValue, const ContainerTypeConstIterator& p_itEnd) : m_itValue(p_itValue), m_itEnd(p_itEnd)
    {
    }

    ContainerTypeConstIterator m_itValue;
    const ContainerTypeConstIterator m_itEnd;
};

int main()
{
    try
    {
        const std::vector<unsigned int> values{ 0, 1, 2, 3, 4, 5 };
        const std::vector<unsigned int> copy(MyIterator::begin(values), MyIterator::end(values));
    }
    catch (const std::runtime_error& e) {
        std::cout << e.what() << std::endl;
    }
    return 0;
}

pkl97 avatar Oct 16 '22 14:10 pkl97

CC: @philnik777

JoeLoser avatar Oct 17 '22 02:10 JoeLoser

Did some more testing. The problem does not occur with clang 12.0.1 and 13.0.1 together with libc++, so it looks like a regression starting with clang 14. It is however independent of the C++ Standard used. It also occurs when using C++17 or C++14 (if operator==() is properly implemented of course).

Here is a sample program that uses Boost.Regex in combination with ICU:

#include <boost/regex/icu.hpp>

int main()
{
    try
    {
        // clang 15 leaks 12 bytes of memory
        boost::make_u32regex("^\xed\xa0\x80$"); // single lowest lead surrogate U+D800
    }
    catch (...) {
    }
    return EXIT_SUCCESS;
}

It leaks 12 bytes when run. The given string is the UTF-8 encoding of a single lead surrogate which is illegal because it is only one half of a character. Internally operator*() throws in Boost.Regex.

pkl97 avatar Oct 17 '22 20:10 pkl97

@llvm/issue-subscribers-clang-codegen

llvmbot avatar Oct 17 '22 21:10 llvmbot

This is actually a libc++ bug. We removed the base class in LLVM14, which caused this bug. I'll look into how to fix the bug in a nice way. Unfortunately we don't have delegating constructors yet in C++03. @var-const do you have interest in adding that to C++03? You mentioned something in a patch. Pinging @ldionne for awareness.

philnik777 avatar Oct 17 '22 22:10 philnik777

Any idea when a fix for this issue could be available?

I assume there is no chance for a backport to clang 15?

pkl97 avatar Nov 03 '22 14:11 pkl97

Sorry, I had something else to do and then forgot about it. I've got https://reviews.llvm.org/D138601 to fix the issue now. I hope we can get it into LLVM15, since it's a pretty serious issue.

philnik777 avatar Nov 23 '22 20:11 philnik777

Thank you for the fix and the backport to clang 15.0.7!

Works like a charm.

pkl97 avatar Jan 14 '23 08:01 pkl97