milvus
milvus copied to clipboard
[Bug]: search expr in scalar field return all result when value is out of bounds
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
when i create a collection with scalar field, like a int8 scalar field, we use the scalar field to search, expr like in8>128, the api return all result.
Expected Behavior
i think shoud return empty or out of bounds error will be reasonable
Steps To Reproduce
the code is :
import random
import numpy as np
from pymilvus import (
connections,
FieldSchema, CollectionSchema, DataType,
Collection,
utility
)
# Const names
_COLLECTION_NAME = 'demo_test'
_ID_FIELD_NAME = 'id_field'
_INT8_FIELD_NAME = 'int8_field'
_VECTOR_FIELD_NAME = 'float_vector_field'
# Vector parameters
_DIM = 128
_INDEX_FILE_SIZE = 32 # max file size of stored index
# Index parameters
_METRIC_TYPE = 'L2'
_INDEX_TYPE = 'AUTOINDEX'
_NLIST = 1024
_NPROBE = 16
_TOPK = 10
def create_connection():
print(f"\nCreate connection...")
connections.connect("default", uri="xxxxx",
user="xxx", password="xxxx", secure=True)
print(connections.list_connections())
def has_collection(name):
return utility.has_collection(name)
# Drop a collection in Milvus
def drop_collection(name):
collection = Collection(name)
collection.drop()
print("\nDrop collection: {}".format(name))
# Create a collection named 'demo'
def create_collection(name, id_field, int8_field, vector_field):
field1 = FieldSchema(name=id_field, dtype=DataType.INT64, description="int64", is_primary=True)
field2 = FieldSchema(name=int8_field, dtype=DataType.INT8, description="int8" )
field3 = FieldSchema(name=vector_field, dtype=DataType.FLOAT_VECTOR, description="float vector", dim=_DIM,
is_primary=False)
schema = CollectionSchema(fields=[field1, field2, field3], description="collection description")
collection = Collection(name=name, data=None, schema=schema, properties={"collection.ttl.seconds": 15})
print("\ncollection created:", name)
return collection
def set_properties(collection):
collection.set_properties(properties={"collection.ttl.seconds": 1800})
# List all collections in Milvus
def list_collections():
print("\nlist collections:")
print(utility.list_collections())
def insert(collection, num, dim):
data = [
[i for i in range(num)],
[np.int8(i) for i in range(num)],
[[random.random() for _ in range(dim)] for _ in range(num)],
]
collection.insert(data)
return data[2]
def get_entity_num(collection):
print("\nThe number of entity:")
print(collection.num_entities)
def create_index(collection, filed_name):
index_param = {
"index_type": _INDEX_TYPE,
"params": {"nlist": _NLIST},
"metric_type": _METRIC_TYPE}
collection.create_index(filed_name, index_param)
print("\nCreated index:\n{}".format(collection.index().params))
def load_collection(collection):
collection.load()
def search(collection, vector_field, id_field, expr, search_vectors):
search_param = {
"data": search_vectors,
"anns_field": vector_field,
"param": {"metric_type": _METRIC_TYPE, "params": {"nprobe": _NPROBE}},
"limit": _TOPK,
"expr": expr}
results = collection.search(**search_param)
for i, result in enumerate(results):
print("\nSearch result for {}th vector: ".format(i))
for j, res in enumerate(result):
print("Top {}: {}".format(j, res))
if __name__ == '__main__':
# create a connection
create_connection()
# drop collection if the collection exists
if has_collection(_COLLECTION_NAME):
drop_collection(_COLLECTION_NAME)
# create collection
collection = create_collection(_COLLECTION_NAME, _ID_FIELD_NAME, _INT8_FIELD_NAME, _VECTOR_FIELD_NAME)
collection = Collection(_COLLECTION_NAME)
# alter ttl properties of collection level
set_properties(collection)
# show collections
list_collections()
# insert 10000 vectors with 128 dimension
vectors = insert(collection, 100, _DIM)
collection.flush()
# get the number of entities
get_entity_num(collection)
# create index
create_index(collection, _VECTOR_FIELD_NAME)
# load data to memory
load_collection(collection)
# search
search(collection, _VECTOR_FIELD_NAME, _ID_FIELD_NAME, expr="int8_field >= 0", search_vectors=vectors[:3])
# search
search(collection, _VECTOR_FIELD_NAME, _ID_FIELD_NAME, expr="int8_field >= 128", search_vectors=vectors[:3])
Milvus Log
No response
Anything else?
No response
/assign @NicoYuan1986 could you please reproduce it in house /unassign
@yanliang567 Reproduced. milvus: master-20230423-b7cb34b9 pymilvus: 2.4.0.dev12
I think there's something wrong with datatype 'int8'
I guess this due to int8 overflow
need to do a check of value range when parsing.
/assign @longjiquan
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Long is working on fixed it
The unary expr fix is not correct, the conversion breaks the bool data, and it may lead to the overflow path.
already fixed, please retry this. /assign @yelusion2
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/unassign
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.