ikd-Tree icon indicating copy to clipboard operation
ikd-Tree copied to clipboard

serious issue about nth_element behavior

Open DingFeng9313 opened this issue 2 years ago • 5 comments

I am doing some rewrite of this ikdtree to adapt my situation. And I find you used nth_element while building a tree. According to my experiments there might be a bug.

Suppose at some point division-axis is 0, which is x-axis, let's use 1-D data to explain:

Suppose we have a vector [0,1,2,2,2,2,2,2,3,4] which is points' x-axis value after nth_element() function call, y-axis value is totally different, say [0,1,2,3,4,5,6,7,8,9]。nth_element() divide these points into different group but not determinately which point is on left or right node. point (2,2) might be on the left or right.

Thus while searching, or delete or add by point. it not guaranteed to find this point.

DingFeng9313 avatar Jun 13 '22 06:06 DingFeng9313

    // split in the middle
    DistanceType split_val = (bbox[cutfeat].low + bbox[cutfeat].high) / 2;
    ElementType min_elem, max_elem;
    computeMinMax(obj, ind, count, cutfeat, min_elem, max_elem);

    if (split_val < min_elem)
      cutval = min_elem;
    else if (split_val > max_elem)
      cutval = max_elem;
    else
      cutval = split_val;

    IndexType lim1, lim2;
    planeSplit(obj, ind, count, cutfeat, cutval, lim1, lim2);

I check the code in nanoflann library. Their splitting method is directly using mid value (not middle point) as dividing pivot. I guess they can't guarantee the tree is balanced. But still their searching is quite efficient and accurate.

DingFeng9313 avatar Jun 13 '22 06:06 DingFeng9313

I was able to address this in a re-implementation I've been working on by continuing to use std::nth_element as-is, but afterwards:

  • Retrieving the median point at index of "mid"
  • Using std::partition on points after the middle element, with a predicate that returns true if their element on the split axis equal the median element
  • Using the iterator returned from std::partition minus 1 to determine the data assigned to the node (and left and right son assignment)

I believe this forces all items matching the partition element to be moved into the left son, and doesn't add too much additional overhead.

russellsch avatar Dec 29 '22 01:12 russellsch

@russellsch Hey Russellsch, I am also suffering inaccurate kdtree search result and think this is because of nth_element as DingFeng9313 mentioned above. Can you attach the modified code block here? I looked into your forked repository and found it is not committed yet.

engcang avatar Mar 14 '23 22:03 engcang

@engcang yes, sorry my fork doesn't have any changes - the changes I mentioned were on a from-scratch rewrite to try and get some additional speed gains via SIMD (using Eigen), templating (for uint types instead of float), and add variable number of dimensions - not based on this repo.

I wasn't seeing the performance gains I was hoping for so didn't complete it - ended up using nanoflan with a wrapper.

That said I CAN paste you my equivalent of this repos's BuildTree function (consider it CC0 licensed, use it however you like). Hope that can help you and others! (I added some additional comments and two "ADDED FIX" comments to show where my 3 suggestions above are implemented)

//Note: using PointType = Eigen::Matrix<TScalar, dims, 1>;
//Note: using PointVecType       = std::vector<PointType>;    // Note: Vector is used so points can be sorted more easily w/ std::sort
//Note: using PointEigenMatType1 = Eigen::Map<Eigen::Matrix<TScalar, dims, Eigen::Dynamic>>;

/// @brief Builds a perfectly balanced tree (or sub-tree) based on a vector of Points that are able to be sorted in-place
/// @param points Reference to vector of points that will be sorted as the tree is built
/// @param left_point_index Index of the left-most point in the vector to build sub-tree for
/// @param right_point_index Index of the right-most point in the vector to build sub-tree for
template<typename TScalar, uint8_t dims>
inline std::unique_ptr<typename IKDTree<TScalar, dims>::IKDTreeNode> IKDTree<TScalar, dims>::buildSubtree_(
        PointVecType& points, size_t left_point_index, size_t right_point_index) {
    assert(left_point_index <= right_point_index && "left_point_index > right_point_index");
    assert(!points.empty() && "points should never be empty");
    if(points.empty()) {    // TODO: Move this guarantee into build to get out of a hot loop?
        return nullptr;
    }

    auto node_ptr = std::make_unique<IKDTreeNode>();

    // Create single element nodes and bail out quickly
    if(left_point_index == right_point_index) {
        node_ptr->point = points[left_point_index];
        pullUpUpdate_(*(node_ptr.get()));
        return node_ptr;
    }

    //////////// ADDED FIX (Retrieving the median point at index of "mid")
    const size_t mid_point_index = (left_point_index + right_point_index) / 2UL;
    /// Find distance between right and left indices - size of points of interest
    const size_t index_distance = right_point_index - left_point_index + 1UL;

    /// Map for treating vector of points as an Eigen map
    const PointEigenMatType1 points_mat(&points[0][0], dims, points.size());

    const size_t mid_point_index = (left_point_index + right_point_index) / 2UL;

    // Find point whose each value is min/max of all points within range of left and right indices
    // This represents the min and max corners of a box that contains the points
    const PointType min_point = points_mat.middleCols(left_point_index, index_distance).rowwise().minCoeff();
    const PointType max_point = points_mat.middleCols(left_point_index, index_distance).rowwise().maxCoeff();

    // Find the largest axis of the box that contains the points - this is axis with maximal covariance and is used as the division axis of this node
    typename PointType::Index largest_axis;
    (max_point - min_point).maxCoeff(&largest_axis);
    assert(largest_axis >= 0);
    node_ptr->axis = static_cast<uint8_t>(largest_axis);

    // Find median and partially sort such that points are on correct side of the median within the range of left and right indices
    // Note: that elements that are equal to the split may be left or right of the nth element!
    std::nth_element(points.begin() + left_point_index, points.begin() + mid_point_index, points.begin() + right_point_index + 1UL,
            [largest_axis](const PointType& p1, const PointType& p2) { return p1[largest_axis] < p2[largest_axis]; });
    const PointType median_point = *(points.begin() + mid_point_index);

    //////////// ADDED FIX (std::partition and using returned iterator)
    // To address elements equal to the median being on both sides of the mid-point, partitioning is used on the second half to push any
    // elements matching the nth element to the left of the iterator (which points to first element to be assigned to right son)
    auto split_plus_one_it            = std::partition(points.begin() + mid_point_index + 1UL, points.begin() + right_point_index + 1UL,
                       [largest_axis, median_point](const PointType& p1) { return p1(largest_axis) == median_point(largest_axis); });
    const size_t split_plus_one_index = split_plus_one_it - points.begin();
    const size_t split_index          = split_plus_one_index - 1UL;

    node_ptr->point = points[split_index];

    if(left_point_index != split_index) {    // If mid-point is the same as left index, no need to build a left son
                                             // Check is needed so mid_point_index of 0 doesn't wrap around
        size_t left_son_rightmost_index = 0UL;
        if(split_index == 0UL) {
            left_son_rightmost_index = 0UL;
        } else {
            left_son_rightmost_index = split_index - 1UL;
        }
        node_ptr->left_son_ptr             = buildSubtree_(points, left_point_index, left_son_rightmost_index);
        node_ptr->left_son_ptr->father_ptr = node_ptr.get();
    }

    if(right_point_index != split_index) {    // If mid-point is the same as right index, no need to build a right son
        node_ptr->right_son_ptr             = buildSubtree_(points, split_index + 1UL, right_point_index);
        node_ptr->right_son_ptr->father_ptr = node_ptr.get();
    }

    // Lazy label init - by default all new nodes are initialized with pointDeleted, treeDeleted, and pushDown as false
    // Pullup - build out tree_size, invalid_num, node_min_max (range), bounding radius squared (used for searching)
    pullUpUpdate_(*(node_ptr.get()));

    return node_ptr;
}

russellsch avatar Mar 17 '23 18:03 russellsch

@russellsch Hey Russellsch, Thanks for sharing your code block. I tried it and inaccurate results of kdtree search didn't get solved, unfortunately. But super thank you for your kind reply :)

(I am trying to set a specific property of points within radius as here: https://github.com/engcang/ikd-Tree/blob/coverage_check/ikd_Tree.cpp#L662-L671)

engcang avatar Mar 23 '23 09:03 engcang