Bug: Index.get() returns inconsistent values for non-existent key
Describe the bug
When calling Index.get() with a key that doesn't exist in the index, it sometimes returns the vector of some values, instead of consistently returning None as specified in the official documentation.
Steps to reproduce
Code to Reproduce
# Python 3.12
from usearch.index import Index
import numpy as nd
index = Index(ndim=10)
index.add(1, nd.array([0.5]*10))
index.add(2, nd.array([0.4]*10))
print(index.contains(1), index.get(1))
print(index.contains(2), index.get(2))
print(index.contains(3), index.get(3))
print(index.contains(102030), index.get(102030))
Output:
True [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
True [0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375
0.3984375 0.3984375 0.3984375]
False [0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375
0.3984375 0.3984375 0.3984375]
False [0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375 0.3984375
0.3984375 0.3984375 0.3984375]
The last two lines should be False None
Expected behavior
As document noted, it should return None, if one key is requested and its not present.
USearch version
2.15.1
Operating System
macOS Sonoma
Hardware architecture
Arm
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
- [X] I am open to being mentioned in the project
.githistory as a contributor
Is there an existing issue for this?
- [X] I have searched the existing issues
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Interestingly, trying to access the key -1 gives values which change over time, while too-large keys seems to give the last embedding again (as in the reporter's case)
x = usearch_index.Index(metric='IP', dtype=uindex.ScalarKind(12), ndim=3)
embs = np.float16(np.random.normal(scale=1e-4, size=[128, 3]))
x.add(np.arange(128), embs)
for i in (-2, -1, 0, 127, 128, 256):
print(i)
print(x.get(i))
print(x.get(i))
if i < 128:
print(embs[i])
Output:
-2
[5.555e-05 4.506e-05 6.932e-05]
[5.555e-05 4.506e-05 6.932e-05]
[ 2.307e-05 5.108e-05 -3.988e-05]
-1
[2.307e-05 5.108e-05 3.988e-05]
[2.307e-05 5.108e-05 3.988e-05]
[ 7.74e-05 8.04e-05 -9.01e-05]
0
[-6.193e-05 7.033e-06 1.405e-04]
[-6.193e-05 7.033e-06 1.405e-04]
[-6.193e-05 7.033e-06 1.405e-04]
127
[ 7.74e-05 8.04e-05 -9.01e-05]
[ 7.74e-05 8.04e-05 -9.01e-05]
[ 7.74e-05 8.04e-05 -9.01e-05]
128
[7.74e-05 8.04e-05 9.01e-05]
[7.74e-05 8.04e-05 9.01e-05]
256
[7.74e-05 8.04e-05 9.01e-05]
[7.74e-05 8.04e-05 9.01e-05]
Root Cause Analysis
I've traced this bug to the C++ Python binding implementation in python/lib.cpp. The issue occurs in the get_typed_vectors_for_keys function when multi=False.
The Problem
When multi=False, the code allocates a numpy array and calls index.get() without checking if the key exists:
// python/lib.cpp lines 931-939
} else {
py::array_t<external_at> result_py({keys_count, static_cast<Py_ssize_t>(index.scalar_words())});
auto result_py2d = result_py.template mutable_unchecked<2>();
for (Py_ssize_t task_idx = 0; task_idx != keys_count; ++task_idx) {
dense_key_t key = *reinterpret_cast<dense_key_t const*>(keys_data + task_idx * keys_info.strides[0]);
index.get(key, (internal_at*)&result_py2d(task_idx, 0), 1); // ← Return value ignored!
}
return result_py;
}
The underlying C++ index.get() method returns false when a key doesn't exist (see index_dense.hpp lines 2194-2209), but this return value is completely ignored. This leaves uninitialized memory in the numpy array.
Compare this to the multi=True case (lines 913-930) which correctly handles non-existent keys:
if (!vectors_count) {
results[task_idx] = py::none();
continue;
}
Proposed Fix
The multi=False branch should check the return value of index.get() and handle non-existent keys appropriately:
} else {
// For single-key case, return the vector or None directly
if (keys_count == 1) {
dense_key_t key = *reinterpret_cast<dense_key_t const*>(keys_data);
py::array_t<external_at> result_py({1, static_cast<Py_ssize_t>(index.scalar_words())});
auto result_py2d = result_py.template mutable_unchecked<2>();
bool found = index.get(key, (internal_at*)&result_py2d(0, 0), 1);
if (!found) {
return py::none();
}
return result_py[0]; // Return the single vector, not a 2D array
}
// For multiple keys, return a tuple with None for missing keys
py::tuple results(keys_count);
for (Py_ssize_t task_idx = 0; task_idx != keys_count; ++task_idx) {
dense_key_t key = *reinterpret_cast<dense_key_t const*>(keys_data + task_idx * keys_info.strides[0]);
py::array_t<external_at> result_py({1, static_cast<Py_ssize_t>(index.scalar_words())});
auto result_py2d = result_py.template mutable_unchecked<2>();
bool found = index.get(key, (internal_at*)&result_py2d(0, 0), 1);
if (!found) {
results[task_idx] = py::none();
} else {
results[task_idx] = result_py[0];
}
}
return results;
}
This fix ensures:
- Single non-existent keys return
None(not uninitialized memory) - Multiple key queries return a tuple with
Nonefor non-existent keys - Behavior is consistent between
multi=Trueandmulti=Falsemodes - Matches the documented API contract
Why This Is Critical
This bug can lead to:
- Security issues: Uninitialized memory might contain sensitive data
-
Unpredictable behavior: Values change between calls (as shown with key
-1) - Silent data corruption: Applications may process garbage data thinking it's valid
The fix is straightforward - just check the return value that's already being provided by the underlying C++ method.
Nice suggestions, @titusz! Can you please open a PR? I’ll merge a bit later today 🤗