milvus [Bug]: search expr in scalar field return all result when value is out of bounds

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

when i create a collection with scalar field, like a int8 scalar field, we use the scalar field to search, expr like in8>128, the api return all result.

Expected Behavior

i think shoud return empty or out of bounds error will be reasonable

Steps To Reproduce

the code is :

import random
import numpy as np


from pymilvus import (
    connections,
    FieldSchema, CollectionSchema, DataType,
    Collection,
    utility
)

# Const names
_COLLECTION_NAME = 'demo_test'
_ID_FIELD_NAME = 'id_field'
_INT8_FIELD_NAME = 'int8_field'
_VECTOR_FIELD_NAME = 'float_vector_field'

# Vector parameters
_DIM = 128
_INDEX_FILE_SIZE = 32  # max file size of stored index

# Index parameters
_METRIC_TYPE = 'L2'
_INDEX_TYPE = 'AUTOINDEX'
_NLIST = 1024
_NPROBE = 16
_TOPK = 10

def create_connection():
    print(f"\nCreate connection...")
    connections.connect("default", uri="xxxxx",
                        user="xxx", password="xxxx", secure=True)
    print(connections.list_connections())

def has_collection(name):
    return utility.has_collection(name)

# Drop a collection in Milvus
def drop_collection(name):
    collection = Collection(name)
    collection.drop()
    print("\nDrop collection: {}".format(name))

# Create a collection named 'demo'
def create_collection(name, id_field, int8_field, vector_field):
    field1 = FieldSchema(name=id_field, dtype=DataType.INT64, description="int64", is_primary=True)
    field2 = FieldSchema(name=int8_field, dtype=DataType.INT8, description="int8" )
    field3 = FieldSchema(name=vector_field, dtype=DataType.FLOAT_VECTOR, description="float vector", dim=_DIM,
                         is_primary=False)
    schema = CollectionSchema(fields=[field1, field2, field3], description="collection description")
    collection = Collection(name=name, data=None, schema=schema, properties={"collection.ttl.seconds": 15})
    print("\ncollection created:", name)
    return collection

def set_properties(collection):
    collection.set_properties(properties={"collection.ttl.seconds": 1800})


# List all collections in Milvus
def list_collections():
    print("\nlist collections:")
    print(utility.list_collections())

def insert(collection, num, dim):
    data = [
        [i for i in range(num)],
        [np.int8(i) for i in range(num)],
        [[random.random() for _ in range(dim)] for _ in range(num)],
    ]
    collection.insert(data)
    return data[2]

def get_entity_num(collection):
    print("\nThe number of entity:")
    print(collection.num_entities)

def create_index(collection, filed_name):
    index_param = {
        "index_type": _INDEX_TYPE,
        "params": {"nlist": _NLIST},
        "metric_type": _METRIC_TYPE}
    collection.create_index(filed_name, index_param)
    print("\nCreated index:\n{}".format(collection.index().params))

def load_collection(collection):
    collection.load()


def search(collection, vector_field, id_field, expr, search_vectors):
    search_param = {
        "data": search_vectors,
        "anns_field": vector_field,
        "param": {"metric_type": _METRIC_TYPE, "params": {"nprobe": _NPROBE}},
        "limit": _TOPK,
        "expr": expr}
    results = collection.search(**search_param)
    for i, result in enumerate(results):
        print("\nSearch result for {}th vector: ".format(i))
        for j, res in enumerate(result):
            print("Top {}: {}".format(j, res))

if __name__ == '__main__':
    # create a connection
    create_connection()

    # drop collection if the collection exists
    if has_collection(_COLLECTION_NAME):
        drop_collection(_COLLECTION_NAME)

    # create collection
    collection = create_collection(_COLLECTION_NAME, _ID_FIELD_NAME, _INT8_FIELD_NAME, _VECTOR_FIELD_NAME)

    collection = Collection(_COLLECTION_NAME)
    # alter ttl properties of collection level
    set_properties(collection)

    # show collections
    list_collections()

    # insert 10000 vectors with 128 dimension
    vectors = insert(collection, 100, _DIM)

    collection.flush()
    # get the number of entities
    get_entity_num(collection)

    # create index
    create_index(collection, _VECTOR_FIELD_NAME)

    # load data to memory
    load_collection(collection)

    # search
    search(collection, _VECTOR_FIELD_NAME, _ID_FIELD_NAME, expr="int8_field >= 0", search_vectors=vectors[:3])

    # search
    search(collection, _VECTOR_FIELD_NAME, _ID_FIELD_NAME, expr="int8_field >= 128", search_vectors=vectors[:3])

Milvus Log

No response

Anything else?

No response

Apr 24 '23 03:04 yelusion2

/assign @NicoYuan1986 could you please reproduce it in house /unassign

Apr 24 '23 04:04 yanliang567

@yanliang567 Reproduced. milvus: master-20230423-b7cb34b9 pymilvus: 2.4.0.dev12

I think there's something wrong with datatype 'int8'

Apr 24 '23 11:04 NicoYuan1986

I guess this due to int8 overflow

Apr 24 '23 18:04 xiaofan-luan

need to do a check of value range when parsing.

Apr 24 '23 18:04 xiaofan-luan

/assign @longjiquan

Apr 24 '23 18:04 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Jun 09 '23 09:06 stale[bot]

Long is working on fixed it

Jun 17 '23 07:06 xiaofan-luan

The unary expr fix is not correct, the conversion breaks the bool data, and it may lead to the overflow path.

Jun 20 '23 04:06 yah01

already fixed, please retry this. /assign @yelusion2

Jul 05 '23 02:07 longjiquan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Aug 04 '23 09:08 stale[bot]

/unassign

Aug 07 '23 03:08 longjiquan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Sep 06 '23 23:09 stale[bot]

milvus milvus copied to clipboard

[Bug]: search expr in scalar field return all result when value is out of bounds

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard