llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

[libc++] Introduce one-sided binary search for set_intersection so that it can be completed in sub-linear time for a large class of inputs.

Open ichaer opened this issue 1 year ago • 19 comments

One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most 2log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines: the first one is when operating over non-random iterators, because the classic algorithm requires knowing the container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm would yield Ω(nlog(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons.

One use case that can often fit both scenarios is a std::set_intersection() on std::set instances.

This change is split into 4 separate commits. First we add tests validating the algorithmic complexity of std::lower_bound(), then we introduce the use of one-sided binary search for non-random iterators, then we add complexity validation tests for std::set_intersection(), and, finally, we introduce explicit use of the one-sided binary search version of std::lower_bound() for the instances of std::set_intersection() where there is no question about it improving time complexity.

ichaer avatar Dec 12 '23 18:12 ichaer

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

github-actions[bot] avatar Dec 12 '23 18:12 github-actions[bot]

@llvm/pr-subscribers-libcxx

Author: None (ichaer)

Changes

One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most 2log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines: the first one is when operating over non-random iterators, because the classic algorithm requires knowing the container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm would yield Ω(nlog(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons.

One use case that can often fit both scenarios is a std::set_intersection() on std::set instances.

This change is split into 4 separate commits. First we add tests validating the algorithmic complexity of std::lower_bound(), then we introduce the use of one-sided binary search for non-random iterators, then we add complexity validation tests for std::set_intersection(), and, finally, we introduce explicit use of the one-sided binary search version of std::lower_bound() for the instances of std::set_intersection() where there is no question about it improving time complexity.


Patch is 34.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75230.diff

7 Files Affected:

  • (modified) libcxx/include/__algorithm/iterator_operations.h (+47)
  • (modified) libcxx/include/__algorithm/lower_bound.h (+63-6)
  • (modified) libcxx/include/__algorithm/set_intersection.h (+150-4)
  • (modified) libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp (+14-5)
  • (modified) libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp (+23-5)
  • (modified) libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp (+234-6)
  • (modified) libcxx/test/support/test_iterators.h (+42-13)
diff --git a/libcxx/include/__algorithm/iterator_operations.h b/libcxx/include/__algorithm/iterator_operations.h
index e6176da4f5606..d73573747087e 100644
--- a/libcxx/include/__algorithm/iterator_operations.h
+++ b/libcxx/include/__algorithm/iterator_operations.h
@@ -87,6 +87,53 @@ struct _IterOps<_ClassicAlgPolicy> {
     std::advance(__iter, __count);
   }
 
+  // advance with sentinel, a la std::ranges::advance
+  // it's unclear whether _Iter has a difference_type and whether that's signed, so we play it safe:
+  // use the incoming type for returning and steer clear of negative overflows
+  template <class _Iter, class _Distance>
+  _LIBCPP_HIDE_FROM_ABI constexpr static _Distance advance(_Iter& __iter, _Distance __count, const _Iter& __sentinel) {
+    return _IterOps::__advance(__iter, __count, __sentinel, typename iterator_traits<_Iter>::iterator_category());
+  }
+
+  // advance with sentinel, a la std::ranges::advance -- InputIterator specialization
+  template <class _InputIter, class _Distance>
+  _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+  __advance(_InputIter& __iter, _Distance __count, const _InputIter& __sentinel, input_iterator_tag) {
+    _Distance __dist{};
+    for (; __dist < __count && __iter != __sentinel; ++__dist)
+      ++__iter;
+    return __count - __dist;
+  }
+
+  // advance with sentinel, a la std::ranges::advance -- BidirectionalIterator specialization
+  template <class _BiDirIter, class _Distance>
+  _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+  __advance(_BiDirIter& __iter, _Distance __count, const _BiDirIter& __sentinel, bidirectional_iterator_tag) {
+    _Distance __dist{};
+    if (__count >= 0)
+      for (; __dist < __count && __iter != __sentinel; ++__dist)
+        ++__iter;
+    else
+      for (__count = -__count; __dist < __count && __iter != __sentinel; ++__dist)
+        --__iter;
+    return __count - __dist;
+  }
+
+  // advance with sentinel, a la std::ranges::advance -- RandomIterator specialization
+  template <class _RandIter, class _Distance>
+  _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+  __advance(_RandIter& __iter, _Distance __count, const _RandIter& __sentinel, random_access_iterator_tag) {
+    auto __dist = _IterOps::distance(__iter, __sentinel);
+    _LIBCPP_ASSERT_UNCATEGORIZED(
+        __count == 0 || (__dist < 0) == (__count < 0), "__sentinel must precede __iter when __count<0");
+    if (__count < 0)
+      __dist = __dist > __count ? __dist : __count;
+    else
+      __dist = __dist < __count ? __dist : __count;
+    __iter += __dist;
+    return __count - __dist;
+  }
+
   // distance
   template <class _Iter>
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
diff --git a/libcxx/include/__algorithm/lower_bound.h b/libcxx/include/__algorithm/lower_bound.h
index 91c3bdaafd0cf..b432829667fa9 100644
--- a/libcxx/include/__algorithm/lower_bound.h
+++ b/libcxx/include/__algorithm/lower_bound.h
@@ -27,11 +27,13 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
-_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-_Iter __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
-  auto __len = _IterOps<_AlgPolicy>::distance(__first, __last);
-
+template <class _AlgPolicy, class _Iter, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter __lower_bound_bisecting(
+    _Iter __first,
+    const _Type& __value,
+    typename iterator_traits<_Iter>::difference_type __len,
+    _Comp& __comp,
+    _Proj& __proj) {
   while (__len != 0) {
     auto __l2 = std::__half_positive(__len);
     _Iter __m = __first;
@@ -46,13 +48,68 @@ _Iter __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __
   return __first;
 }
 
+// One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general
+// advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most
+// 2*(log(n)-1) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines:
+// the first one is when operating over non-random iterators, because the classic algorithm requires knowing the
+// container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're
+// traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm
+// would yield Ω(n*log(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided
+// version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons.
+template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
+__lower_bound_onesided(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
+  // static_assert(std::is_base_of<std::forward_iterator_tag, typename _IterOps<_AlgPolicy>::template
+  // __iterator_category<_Iter>>::value,
+  //       "lower_bound() is a multipass algorithm and requires forward iterator or better");
+
+  using _Distance = typename iterator_traits<_Iter>::difference_type;
+  for (_Distance __step = 1; __first != __last; __step <<= 1) {
+    auto __it   = __first;
+    auto __dist = __step - _IterOps<_AlgPolicy>::advance(__it, __step, __last);
+    // once we reach the last range where needle can be we must start
+    // looking inwards, bisecting that range
+    if (__it == __last || !std::__invoke(__comp, std::__invoke(__proj, *__it), __value)) {
+      return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
+    }
+    // range not found, move forward!
+    __first = std::move(__it);
+  }
+  return __first;
+}
+
+template <class _AlgPolicy, class _InputIter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _InputIter __lower_bound(
+    _InputIter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj, std::input_iterator_tag) {
+  return std::__lower_bound_onesided<_AlgPolicy>(__first, __last, __value, __comp, __proj);
+}
+
+template <class _AlgPolicy, class _RandIter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _RandIter __lower_bound(
+    _RandIter __first,
+    _Sent __last,
+    const _Type& __value,
+    _Comp& __comp,
+    _Proj& __proj,
+    std::random_access_iterator_tag) {
+  const auto __dist = _IterOps<_AlgPolicy>::distance(__first, __last);
+  return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
+}
+
+template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
+__lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp&& __comp, _Proj&& __proj) {
+  return std::__lower_bound<_AlgPolicy>(
+      __first, __last, __value, __comp, __proj, typename _IterOps<_AlgPolicy>::template __iterator_category<_Iter>());
+}
+
 template <class _ForwardIterator, class _Tp, class _Compare>
 _LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
 _ForwardIterator lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, _Compare __comp) {
   static_assert(__is_callable<_Compare, decltype(*__first), const _Tp&>::value,
                 "The comparator has to be callable");
   auto __proj = std::__identity();
-  return std::__lower_bound<_ClassicAlgPolicy>(__first, __last, __value, __comp, __proj);
+  return std::__lower_bound<_ClassicAlgPolicy>(__first, __last, __value, std::move(__comp), std::move(__proj));
 }
 
 template <class _ForwardIterator, class _Tp>
diff --git a/libcxx/include/__algorithm/set_intersection.h b/libcxx/include/__algorithm/set_intersection.h
index f2603fe1365ac..556738022f485 100644
--- a/libcxx/include/__algorithm/set_intersection.h
+++ b/libcxx/include/__algorithm/set_intersection.h
@@ -12,9 +12,13 @@
 #include <__algorithm/comp.h>
 #include <__algorithm/comp_ref_type.h>
 #include <__algorithm/iterator_operations.h>
+#include <__algorithm/lower_bound.h>
 #include <__config>
+#include <__functional/identity.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/next.h>
+#include <__type_traits/is_same.h>
+#include <__utility/exchange.h>
 #include <__utility/move.h>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -36,9 +40,122 @@ struct __set_intersection_result {
 };
 
 template <class _AlgPolicy, class _Compare, class _InIter1, class _Sent1, class _InIter2, class _Sent2, class _OutIter>
-_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
-__set_intersection(
-    _InIter1 __first1, _Sent1 __last1, _InIter2 __first2, _Sent2 __last2, _OutIter __result, _Compare&& __comp) {
+struct _LIBCPP_NODISCARD_EXT __set_intersector {
+  _InIter1& __first1_;
+  const _Sent1& __last1_;
+  _InIter2& __first2_;
+  const _Sent2& __last2_;
+  _OutIter& __result_;
+  _Compare& __comp_;
+  static constexpr auto __proj_ = std::__identity();
+  bool __prev_advanced_         = true;
+
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersector(
+      _InIter1& __first1, _Sent1& __last1, _InIter2& __first2, _Sent2& __last2, _OutIter& __result, _Compare& __comp)
+      : __first1_(__first1),
+        __last1_(__last1),
+        __first2_(__first2),
+        __last2_(__last2),
+        __result_(__result),
+        __comp_(__comp) {}
+
+  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+      _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
+      operator()() && {
+    while (__first2_ != __last2_) {
+      __advance1_and_maybe_add_result();
+      if (__first1_ == __last1_)
+        break;
+      __advance2_and_maybe_add_result();
+    }
+    return __set_intersection_result<_InIter1, _InIter2, _OutIter>(
+        _IterOps<_AlgPolicy>::next(std::move(__first1_), std::move(__last1_)),
+        _IterOps<_AlgPolicy>::next(std::move(__first2_), std::move(__last2_)),
+        std::move(__result_));
+  }
+
+private:
+  // advance __iter to the first element in the range where !__comp_(__iter, __value)
+  // add result if this is the second consecutive call without advancing
+  // this method only works if you alternate calls between __advance1_and_maybe_add_result() and
+  // __advance2_and_maybe_add_result()
+  template <class _Iter, class _Sent, class _Value>
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+  __advance_and_maybe_add_result(_Iter& __iter, const _Sent& __sentinel, const _Value& __value) {
+    // use one-sided lower bound for improved algorithmic complexity bounds
+    const auto __tmp =
+        std::exchange(__iter, std::__lower_bound_onesided<_AlgPolicy>(__iter, __sentinel, __value, __comp_, __proj_));
+    __add_output_unless(__tmp != __iter);
+  }
+
+  // advance __first1_ to the first element in the range where !__comp_(*__first1_, *__first2_)
+  // add result if neither __first1_ nor __first2_ advanced in the last attempt (meaning they are equal)
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __advance1_and_maybe_add_result() {
+    __advance_and_maybe_add_result(__first1_, __last1_, *__first2_);
+  }
+
+  // advance __first2_ to the first element in the range where !__comp_(*__first2_, *__first1_)
+  // add result if neither __first1_ nor __first2_ advanced in the last attempt (meaning they are equal)
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __advance2_and_maybe_add_result() {
+    __advance_and_maybe_add_result(__first2_, __last2_, *__first1_);
+  }
+
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __add_output_unless(bool __advanced) {
+    if (__advanced | __prev_advanced_) {
+      __prev_advanced_ = __advanced;
+    } else {
+      *__result_ = *__first1_;
+      ++__result_;
+      ++__first1_;
+      ++__first2_;
+      __prev_advanced_ = true;
+    }
+  }
+};
+
+// with forward iterators we can use binary search to skip over entries
+template <class _AlgPolicy,
+          class _Compare,
+          class _InForwardIter1,
+          class _Sent1,
+          class _InForwardIter2,
+          class _Sent2,
+          class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+    _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InForwardIter1, _InForwardIter2, _OutIter>
+    __set_intersection(
+        _InForwardIter1 __first1,
+        _Sent1 __last1,
+        _InForwardIter2 __first2,
+        _Sent2 __last2,
+        _OutIter __result,
+        _Compare&& __comp,
+        std::forward_iterator_tag,
+        std::forward_iterator_tag) {
+  std::__set_intersector<_AlgPolicy, _Compare, _InForwardIter1, _Sent1, _InForwardIter2, _Sent2, _OutIter>
+      __intersector(__first1, __last1, __first2, __last2, __result, __comp);
+  return std::move(__intersector)();
+}
+
+// input iterators are not suitable for multipass algorithms, so we stick to the classic single-pass version
+template <class _AlgPolicy,
+          class _Compare,
+          class _InInputIter1,
+          class _Sent1,
+          class _InInputIter2,
+          class _Sent2,
+          class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+    _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InInputIter1, _InInputIter2, _OutIter>
+    __set_intersection(
+        _InInputIter1 __first1,
+        _Sent1 __last1,
+        _InInputIter2 __first2,
+        _Sent2 __last2,
+        _OutIter __result,
+        _Compare&& __comp,
+        std::input_iterator_tag,
+        std::input_iterator_tag) {
   while (__first1 != __last1 && __first2 != __last2) {
     if (__comp(*__first1, *__first2))
       ++__first1;
@@ -52,12 +169,41 @@ __set_intersection(
     }
   }
 
-  return __set_intersection_result<_InIter1, _InIter2, _OutIter>(
+  return std::__set_intersection_result<_InInputIter1, _InInputIter2, _OutIter>(
       _IterOps<_AlgPolicy>::next(std::move(__first1), std::move(__last1)),
       _IterOps<_AlgPolicy>::next(std::move(__first2), std::move(__last2)),
       std::move(__result));
 }
 
+template <class _AlgPolicy, class _Iter>
+class __set_intersection_iter_category {
+  template <class _It>
+  using __cat = typename std::_IterOps<_AlgPolicy>::template __iterator_category<_It>;
+  template <class _It>
+  static auto test(__cat<_It>*) -> __cat<_It>;
+  template <class>
+  static std::input_iterator_tag test(...);
+
+public:
+  using __type = decltype(test<_Iter>(nullptr));
+};
+
+template <class _AlgPolicy, class _Compare, class _InIter1, class _Sent1, class _InIter2, class _Sent2, class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+    _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
+    __set_intersection(
+        _InIter1 __first1, _Sent1 __last1, _InIter2 __first2, _Sent2 __last2, _OutIter __result, _Compare&& __comp) {
+  return std::__set_intersection<_AlgPolicy>(
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__last2),
+      std::move(__result),
+      std::forward<_Compare>(__comp),
+      typename std::__set_intersection_iter_category<_AlgPolicy, _InIter1>::__type(),
+      typename std::__set_intersection_iter_category<_AlgPolicy, _InIter2>::__type());
+}
+
 template <class _InputIterator1, class _InputIterator2, class _OutputIterator, class _Compare>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _OutputIterator set_intersection(
     _InputIterator1 __first1,
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
index a2d8ab632303c..5c11962d13777 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
@@ -39,11 +39,20 @@ template <class Iter, class T>
 void
 test(Iter first, Iter last, const T& value)
 {
-    Iter i = std::lower_bound(first, last, value);
-    for (Iter j = first; j != i; ++j)
-        assert(*j < value);
-    for (Iter j = i; j != last; ++j)
-        assert(!(*j < value));
+  std::size_t strides{};
+  std::size_t displacement{};
+  stride_counting_iterator f(first, &strides, &displacement);
+  stride_counting_iterator l(last, &strides, &displacement);
+
+  auto i = std::lower_bound(f, l, value);
+  for (auto j = f; j != i; ++j)
+    assert(*j < value);
+  for (auto j = i; j != l; ++j)
+    assert(!(*j < value));
+
+  auto len = std::distance(first, last);
+  assert(strides <= 2.5 * len + 1);
+  assert(displacement <= 2.5 * len + 1);
 }
 
 template <class Iter>
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
index b9133028d9ade..05fd43eada461 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
@@ -17,6 +17,7 @@
 #include <vector>
 #include <cassert>
 #include <cstddef>
+#include <cmath>
 
 #include "test_macros.h"
 #include "test_iterators.h"
@@ -38,11 +39,28 @@ template <class Iter, class T>
 void
 test(Iter first, Iter last, const T& value)
 {
-    Iter i = std::lower_bound(first, last, value, std::greater<int>());
-    for (Iter j = first; j != i; ++j)
-        assert(std::greater<int>()(*j, value));
-    for (Iter j = i; j != last; ++j)
-        assert(!std::greater<int>()(*j, value));
+  std::size_t strides{};
+  std::size_t displacement{};
+  stride_counting_iterator f(first, &strides, &displacement);
+  stride_counting_iterator l(last, &strides, &displacement);
+
+  std::size_t comparisons{};
+  auto cmp = [&comparisons](int rhs, int lhs) {
+    ++comparisons;
+    return std::greater<int>()(rhs, lhs);
+  };
+
+  auto i = std::lower_bound(f, l, value, cmp);
+
+  for (auto j = f; j != i; ++j)
+    assert(std::greater<int>()(*j, value));
+  for (auto j = i; j != l; ++j)
+    assert(!std::greater<int>()(*j, value));
+
+  auto len = std::distance(first, last);
+  assert(strides <= 2.5 * len + 1);
+  assert(displacement <= 2.5 * len + 1);
+  assert(comparisons <= 2 * ceil(log(len + 1) + 2));
 }
 
 template <class Iter>
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
index 0ee89e0131a07..30cedd19038d7 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
@@ -28,6 +28,9 @@
 #include <algorithm>
 #include <array>
 #include <concepts>
+#include <cstddef>
+#include <iterator>
+#include <type_traits>
 
 #include "almost_satisfies_types.h"
 #include "MoveOnly.h"
@@ -93,14 +96,17 @@ static_assert(!HasSetIntersectionRange<UncheckedRange<MoveOnly*>, UncheckedRange
 
 using std::ranges::set_intersection_result;
 
+// TODO: std::ranges::set_intersection calls std::ranges::copy
+// std::ranges::copy(contiguous_iterator<int*>, sentinel_wrapper<contiguous_iterator<int*>>, contiguous_iterator<int*>) doesn't seem to work.
+// It seems that std::ranges::copy calls std::copy, which unwraps contiguous_iterator<int*> into int*,
+// and then it failed because there is no == between int* and sentinel_wrapper<contiguous_iterator<int*>>
+template <typename Iter>
+using SentinelWorkaround = std::conditional_t<std::contiguous_iterator<Iter>, Iter, sentinel_wrapper<Iter>>;
+
 template <class In1, class In2, class Out, std::size_t N1, std::size_t N2, std::size_t N3>
 constexpr void testSetIntersectionImpl(std::array<int, N1> in1, std::array<int, N2> in2, std::array<int, N3> expected) {
-  ...
[truncated]

llvmbot avatar Dec 12 '23 18:12 llvmbot

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

github-actions[bot] avatar Dec 12 '23 18:12 github-actions[bot]

Thanks for the patch. If I understand it right this might violate the complexity requirements in http://eel.is/c++draft/lower.bound#4. Is that correct?

mordante avatar Dec 12 '23 18:12 mordante

Hi Mark, thanks for your reply!

Strictly speaking, yes, that is true: the number of comparisons may violate the requirement of there being at most log(last - first) + O(1) projections and comparisons. However, that is only being done for non-random access iterators, where we are also executing 2*(last-first) iterator mutations, which are not part of the explicit algorithm guarantees, and are the reason why std::set and std::map have their own lower_bound -- std::lower_bound is really hard to use on non-random iterators because of this additional cost.

Would this be a blocker for accepting the contribution?

ichaer avatar Dec 12 '23 18:12 ichaer

Hi Mark, thanks for your reply!

Strictly speaking, yes, that is true: the number of comparisons may violate the requirement of there being at most log(last - first) + O(1) projections and comparisons. However, that is only being done for non-random access iterators, where we are also executing 2*(last-first) iterator mutations, which are not part of the explicit algorithm guarantees, and are the reason why std::set and std::map have their own lower_bound -- std::lower_bound is really hard to use on non-random iterators because of this additional cost.

Would this be a blocker for accepting the contribution?

Yes. We have to stay standards-conforming - libc++ is an implementation of the C++ standard after all, so even if the trade-off seems like a good idea we can't make the call in this case. If this improves the performance for trivial calls like an integer and std::less or friends as the comparator, that should be allowable, since that's not user-visible. Anything as generic as you currently have won't be possible.

philnik777 avatar Dec 12 '23 19:12 philnik777

OK, that's fair. How about I just stop exposing one-sided lower bound through std::lower_bound, and just use it for std::set_intersection, where all complexity guarantees are preserved, would that be OK?

ichaer avatar Dec 12 '23 19:12 ichaer

OK, that's fair. How about I just stop exposing one-sided lower bound through std::lower_bound, and just use it for std::set_intersection, where all complexity guarantees are preserved, would that be OK?

If all the complexity guarantees are met, that would be OK. We always like to see some benchmarks to have some data that the optimization actually improves performance.

philnik777 avatar Dec 12 '23 19:12 philnik777

SGTM! I'll amend my submission, thanks!

ichaer avatar Dec 12 '23 20:12 ichaer

SGTM! I'll amend my submission, thanks!

Great.

As @philnik777 said we need to be careful about observable behaviour. I love to see performance improvements :-)

mordante avatar Dec 12 '23 21:12 mordante

Cool, thanks :). Just for context, this code is actually being used in production out there. I work for Splunk, and we've implemented this outside of the standard library to speed up a component that does a number of set intersections, overlaying containers on top of each other (it's more complex, but that's the relevant part in a nutshell). The huge majority of the sets involved contain a tiny subset of what the target contains, so this makes a huge difference, and we thought we should contribute upstream.

Obviously I won't ask you to take me on my word :). Code is truth, I'll spend a bit of time understanding existing benchmarks in the LLVM repo to include something relevant in the pull request.

ichaer avatar Dec 12 '23 21:12 ichaer

I don't think you're wrong that this improves performance, but being able to point to hard data is always great. With algorithms it's almost always really easy to write micro benchmarks, so there we really like to have them.

philnik777 avatar Dec 13 '23 18:12 philnik777

Sounds good to me! I'll be out on holiday for a couple of weeks and should be able to update this PR in early January. Thanks for your feedback so far, I appreciate it :).

ichaer avatar Dec 13 '23 22:12 ichaer

Sorry for all the noise, I'll work through the CI failures in a private PR... I thought I had a good handle on the tests being executed, but clearly I was mistaken :sweat_smile:

ichaer avatar Feb 02 '24 19:02 ichaer

@philnik777, I'm convinced that the one test failure I got is unrelated, and, from the looks of it, a likely flaky test (std/thread/thread.semaphore/timed.pass.cpp, in the mingw-dll-i686 job). Could you please take a look at the changes now?

ichaer avatar Feb 13 '24 08:02 ichaer

@philnik777, I'd love to hear your thoughts about the change :)

ichaer avatar Feb 21 '24 10:02 ichaer

@philnik777, any more feedback?

ichaer avatar Feb 29 '24 15:02 ichaer

@philnik777, what do you think? Any more feedback?

ichaer avatar Mar 19 '24 10:03 ichaer

Hey, sorry for the huge delay on this. I'll review this next week. I'll reach out to you privately to schedule some time to go over the PR, I think this kind of involved change will benefit from some discussion and explanation.

ldionne avatar Apr 19 '24 14:04 ldionne

I've deleted my old comment about inconsistent benchmark results to avoid misleading others with my bad conjectures. The real reason benchmark results were inconsistent was the fact that we use randomly-generated data as input, and, embarrassingly, it hadn't occurred to me to seed the random number generators involved with hardcoded values before producing results for a before-and-after comparison. Given a large enough sample, and if we were looking only at the results in aggregate, we'd expect that not to make a difference, but samples were not large enough (especially for the larger runs), and I was comparing test cases independently, so comparison results were surprising at times.

What I've done to produce the comparison results shared in the PR comment was use a hardcoded value of 0 as the seed for getRandomEngine() in libcxx/benchmarks/GenerateInput.h. I'm attaching the json benchmark results here in case anyone is interested in looking at the full thing without having to run it themselves.

onesided-be6c5c848d.json main-b926f75e89.json

ichaer avatar Apr 30 '24 12:04 ichaer