pybind11
pybind11 copied to clipboard
[BUG]: slow compile if very long std::array being used with clang-18
Required prerequisites
- [x] Make sure you've read the documentation. Your issue may be addressed there.
- [x] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- [ ] Consider asking first in the Gitter chat room or in a Discussion.
What version (or hash if on master) of pybind11 are you using?
3.0.0
Problem description
I've found that pybind11 compile time very slow if very long std::array being used
clang-18 spends 90 seconds to finish, clang-20 11 seconds, g++-13 13 seconds.
tried 2.16.3/3.0.0 both compile slow with clang-18. clang ftime-trace shows that most of time spent in codegen
for function
std::array<int, 102400ul> pybind11::detail::vector_to_array_impl<std::array<int, 102400ul>, std::vector<int, std::allocator<int>>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, ....
I've created a repo to show this issue https://github.com/comicfans/pybind11_slow_clang/tree/main
clang-18 compile trace (which can be loaded into chrome://tracing) basic.cpp.json
Reproducible example code
#include <array>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
namespace py = pybind11;
struct SlowType {
std::array<int, 102400> member{};
};
PYBIND11_MODULE(basic, module) {
py::class_<SlowType>(module, "SlowType")
.def(py::init<>())
.def_readwrite("member", &SlowType::member);
}
Is this a regression? Put the last known working version here if it is.
Not a regression
I suggest a patch as follows
diff --git a/include/pybind11/stl.h b/include/pybind11/stl.h
index 01be0b47..ab276722 100644
--- a/include/pybind11/stl.h
+++ b/include/pybind11/stl.h
@@ -21,6 +21,7 @@
#include <memory>
#include <ostream>
#include <set>
+#include <type_traits>
#include <unordered_map>
#include <unordered_set>
#include <valarray>
@@ -378,10 +379,17 @@ ArrayType vector_to_array_impl(V &&v, index_sequence<I...>) {
// Based on https://en.cppreference.com/w/cpp/container/array/to_array
template <typename ArrayType, size_t N, typename V>
-ArrayType vector_to_array(V &&v) {
+ArrayType vector_to_array(V &&v, std::false_type) {
return vector_to_array_impl<ArrayType, V>(std::forward<V>(v), make_index_sequence<N>{});
}
+template <typename ArrayType, size_t N, typename V>
+ArrayType vector_to_array(V &&v, std::true_type) {
+ ArrayType ret;
+ std::copy(v.begin(), v.end(), ret.begin());
+ return ret;
+}
+
template <typename ArrayType, typename Value, bool Resizable, size_t Size = 0>
struct array_caster {
using value_conv = make_caster<Value>;
@@ -429,7 +437,12 @@ private:
}
temp.emplace_back(cast_op<Value &&>(std::move(conv)));
}
- value.reset(new ArrayType(vector_to_array<ArrayType, Size>(std::move(temp))));
+
+ std::conjunction<std::is_trivially_default_constructible<Value>,
+ std::is_trivially_copy_assignable<Value>>
+ use_copy;
+
+ value.reset(new ArrayType(vector_to_array<ArrayType, Size>(std::move(temp), use_copy)));
return true;
}
it resolve my compile slow problem and pass all pytest, if it's feasible then I can open a PR for this
now clang-18/20 gcc all finish under 3 seconds