milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Create DISKANN index with COSINE metric will hang after flush

Open binbinlv opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: master latest (master-20230423-b7cb34b9)
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):   rocksmq 
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.0.dev12
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Create DISKANN index with COSINE metric will hang after flush

Expected Behavior

Create index successfully

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility

connections.connect()

dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
float_field = FieldSchema(name="float", dtype=DataType.FLOAT)
bool_field = FieldSchema(name="bool", dtype=DataType.BOOL)
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=65535)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[int64_field, float_field, bool_field, float_vector])

collection = Collection("test_search_collection_binbin_tmp_111", schema=schema)

nb=5000
import random
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
import numpy as np
res = collection.insert([[i for i in range(nb)], [np.float32(i) for i in range(nb)], [np.bool_(i) for i in range(nb)], vectors])
collection.flush()
index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
collection.create_index("float_vector", index_param, index_name="index_name_1")

Milvus Log

23645.log

Anything else?

No response

binbinlv avatar Apr 24 '23 03:04 binbinlv

  1. with flush: DISKANN with L2 is successful
>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)
>>>
  1. without flush: DISKANN with COSINE is successful:
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)
>>>
  1. with flush: DISKANN with COSINE is failed
>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
...
no response here

binbinlv avatar Apr 24 '23 03:04 binbinlv

And from milvus log, it says DISKANN did not support "COSINE", but from the design doc, it says support @cydrain am I right?

I20230424 03:50:27.760720   101 factory.cc:20] [KNOWHERE][Create][milvus] create knowhere index DISKANN
E20230424 03:50:27.761775   101 diskann.cc:240] [KNOWHERE][CheckMetric][milvus] DiskANN currently only supports floating point data for Max Inner Product Search(IP) and minimum Euclidean distance(L2).

binbinlv avatar Apr 24 '23 03:04 binbinlv

DISKANN cannot support COSINE correctly by now, working on it

cydrain avatar Apr 24 '23 11:04 cydrain

Hi @binbinlv, knowhere can support COSINE for diskann, please retest this issue

cydrain avatar Jun 06 '23 06:06 cydrain

OK,working.

binbinlv avatar Jun 08 '23 09:06 binbinlv

/assign

binbinlv avatar Jun 08 '23 09:06 binbinlv

Verified and fixed: pymilvus: 2.4.0.dev69 milvus: master-20230608-d531b177

results:

>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)

binbinlv avatar Jun 08 '23 10:06 binbinlv