milvus
milvus copied to clipboard
[Bug]: Create DISKANN index with COSINE metric will hang after flush
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master latest (master-20230423-b7cb34b9)
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.0.dev12
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Create DISKANN index with COSINE metric will hang after flush
Expected Behavior
Create index successfully
Steps To Reproduce
from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
connections.connect()
dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
float_field = FieldSchema(name="float", dtype=DataType.FLOAT)
bool_field = FieldSchema(name="bool", dtype=DataType.BOOL)
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=65535)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[int64_field, float_field, bool_field, float_vector])
collection = Collection("test_search_collection_binbin_tmp_111", schema=schema)
nb=5000
import random
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
import numpy as np
res = collection.insert([[i for i in range(nb)], [np.float32(i) for i in range(nb)], [np.bool_(i) for i in range(nb)], vectors])
collection.flush()
index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
collection.create_index("float_vector", index_param, index_name="index_name_1")
Milvus Log
Anything else?
No response
- with flush: DISKANN with L2 is successful
>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)
>>>
- without flush: DISKANN with COSINE is successful:
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)
>>>
- with flush: DISKANN with COSINE is failed
>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
...
no response here
And from milvus log, it says DISKANN did not support "COSINE", but from the design doc, it says support @cydrain am I right?
I20230424 03:50:27.760720 101 factory.cc:20] [KNOWHERE][Create][milvus] create knowhere index DISKANN
E20230424 03:50:27.761775 101 diskann.cc:240] [KNOWHERE][CheckMetric][milvus] DiskANN currently only supports floating point data for Max Inner Product Search(IP) and minimum Euclidean distance(L2).
DISKANN cannot support COSINE correctly by now, working on it
Hi @binbinlv, knowhere can support COSINE for diskann, please retest this issue
OK,working.
/assign
Verified and fixed: pymilvus: 2.4.0.dev69 milvus: master-20230608-d531b177
results:
>>> collection.flush()
>>> index_param = {"index_type": "DISKANN", "metric_type": "COSINE", "params": {}}
>>> collection.create_index("float_vector", index_param, index_name="index_name_1")
Status(code=0, message=)