hnswlib
hnswlib copied to clipboard
mutuallyConnectNewElement and updatePoint might cause isolated-island vertex
Hi, I'm modifying HierarchicalNSW so that everything is stored in a high-perf db. And I notice some possible bug.
// If cur_c is already present in the neighboring connections of `selectedNeighbors[idx]` then no need to modify any connections or run the heuristics.
if (!is_cur_c_present) {
if (sz_link_list_other < Mcurmax) {
data[sz_link_list_other] = cur_c;
setListCount(ll_other, sz_link_list_other + 1);
} else {
// finding the "weakest" element to replace it with the new one
dist_t d_max = fstdistfunc_(getDataByInternalId(cur_c), getDataByInternalId(selectedNeighbors[idx]),
dist_func_param_);
// Heuristic:
std::priority_queue<std::pair<dist_t, tableint>, std::vector<std::pair<dist_t, tableint>>, CompareByFirst> candidates;
candidates.emplace(d_max, cur_c);
for (size_t j = 0; j < sz_link_list_other; j++) {
candidates.emplace(
fstdistfunc_(getDataByInternalId(data[j]), getDataByInternalId(selectedNeighbors[idx]),
dist_func_param_), data[j]);
}
getNeighborsByHeuristic2(candidates, Mcurmax);
In mutuallyConnectNewElement, around the second getNeighborsByHeuristic2, we are adding cur_c to the neighbor list of selectedNeighbors
(for convenience, we call this SELECTED_A ). By doing so, we might rm existing neighbor (for convenience, we call this VERTEX_B ) from SELECTED_A's neighborlist.
If VERTEX_B is only connected to SELECTED_A, this rm process will make VERTEX_B unavailable by entry-point.
updatePoint has the same problem.