pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[BUG]: slow compile if very long std::array being used with clang-18

Open comicfans opened this issue 5 months ago • 1 comments

Required prerequisites

  • [x] Make sure you've read the documentation. Your issue may be addressed there.
  • [x] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
  • [ ] Consider asking first in the Gitter chat room or in a Discussion.

What version (or hash if on master) of pybind11 are you using?

3.0.0

Problem description

I've found that pybind11 compile time very slow if very long std::array being used

clang-18 spends 90 seconds to finish, clang-20 11 seconds, g++-13 13 seconds.

tried 2.16.3/3.0.0 both compile slow with clang-18. clang ftime-trace shows that most of time spent in codegen

for function

std::array<int, 102400ul> pybind11::detail::vector_to_array_impl<std::array<int, 102400ul>, std::vector<int, std::allocator<int>>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, ....

I've created a repo to show this issue https://github.com/comicfans/pybind11_slow_clang/tree/main

clang-18 compile trace (which can be loaded into chrome://tracing) basic.cpp.json

Image

Reproducible example code

#include <array>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
namespace py = pybind11;

struct SlowType {
  std::array<int, 102400> member{};
};

PYBIND11_MODULE(basic, module) {
  py::class_<SlowType>(module, "SlowType")
      .def(py::init<>())
      .def_readwrite("member", &SlowType::member);
}

Is this a regression? Put the last known working version here if it is.

Not a regression

comicfans avatar Jul 24 '25 10:07 comicfans

I suggest a patch as follows

diff --git a/include/pybind11/stl.h b/include/pybind11/stl.h
index 01be0b47..ab276722 100644
--- a/include/pybind11/stl.h
+++ b/include/pybind11/stl.h
@@ -21,6 +21,7 @@
 #include <memory>
 #include <ostream>
 #include <set>
+#include <type_traits>
 #include <unordered_map>
 #include <unordered_set>
 #include <valarray>
@@ -378,10 +379,17 @@ ArrayType vector_to_array_impl(V &&v, index_sequence<I...>) {
 
 // Based on https://en.cppreference.com/w/cpp/container/array/to_array
 template <typename ArrayType, size_t N, typename V>
-ArrayType vector_to_array(V &&v) {
+ArrayType vector_to_array(V &&v, std::false_type) {
     return vector_to_array_impl<ArrayType, V>(std::forward<V>(v), make_index_sequence<N>{});
 }
 
+template <typename ArrayType, size_t N, typename V>
+ArrayType vector_to_array(V &&v, std::true_type) {
+    ArrayType ret;
+    std::copy(v.begin(), v.end(), ret.begin());
+    return ret;
+}
+
 template <typename ArrayType, typename Value, bool Resizable, size_t Size = 0>
 struct array_caster {
     using value_conv = make_caster<Value>;
@@ -429,7 +437,12 @@ private:
             }
             temp.emplace_back(cast_op<Value &&>(std::move(conv)));
         }
-        value.reset(new ArrayType(vector_to_array<ArrayType, Size>(std::move(temp))));
+
+        std::conjunction<std::is_trivially_default_constructible<Value>,
+                         std::is_trivially_copy_assignable<Value>>
+            use_copy;
+
+        value.reset(new ArrayType(vector_to_array<ArrayType, Size>(std::move(temp), use_copy)));
         return true;
     }

it resolve my compile slow problem and pass all pytest, if it's feasible then I can open a PR for this

now clang-18/20 gcc all finish under 3 seconds

comicfans avatar Jul 24 '25 12:07 comicfans