milvus
milvus copied to clipboard
[Enhancement]: Loading speed optmization in the serverless mode
Is there an existing issue for this?
- [X] I have searched the existing issues
What would you like to be added?
Currently, loading process in the serverless mode can be rather slow and make search&&query latency very high and unacceptable
Why is this needed?
No response
Anything else?
No response
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.
meet same issue about loadding connection very slow. Did you slove this problem?
meet same issue about loadding connection very slow. Did you slove this problem?
what problem are you facing? I don't think you are facing similar issue as we talk about. if loading collection is slow, usually you will need to check:
- how many segments you have, too many segments will cause load slow
- add more querynodes will help on improving load speed.
- maybe you will need check the tuning parameters if you have large pod size or what, we encourage you to share logs and pprof and explain what is your use cases so we can help
meet same issue about loadding connection very slow. Did you slove this problem?
what problem are you facing? I don't think you are facing similar issue as we talk about. if loading collection is slow, usually you will need to check:
- how many segments you have, too many segments will cause load slow
- add more querynodes will help on improving load speed.
- maybe you will need check the tuning parameters if you have large pod size or what, we encourage you to share logs and pprof and explain what is your use cases so we can help
@xiaofan-luan Report my production environment: milvus verison: v2.3.1 mode:DISTRIBUTED, all nodes one pod Loaded Collections:552 All Collections:552 Entities:32,039 embedding size: 3072 Object store: OSS
now even load a empty collection cost a lot, here is my test script:
import random
import time
from pymilvus import (
connections,
FieldSchema, CollectionSchema, DataType,
Collection,
utility,
)
_HOST = 'xxx'
_PORT = '19530'
if __name__ == '__main__':
connections.connect(host=_HOST, port=_PORT)
dim = 512
collection_name = "loadtest"
if utility.has_collection(collection_name):
utility.drop_collection(collection_name)
field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False)
field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[field1, field2])
collection = Collection(name=collection_name, schema=schema)
print("\ncollection created:", collection_name)
index_param = {
"index_type": "IVF_FLAT",
"params": {"nlist": 256},
"metric_type": "L2"}
collection.create_index("embedding", index_param)
num = 10000
data = [
[i for i in range(num)],
[[random.random() for _ in range(dim)] for _ in range(num)],
]
'''
collection.insert(data)
collection.flush()
print("Insert", num, "vectors")
print("Collection row count:", collection.num_entities)
'''
start = time.time()
collection.load()
end = time.time()
print("Load collection, time cost:", (end-start)*1000, "ms")
script output: collection created: loadtest Load collection, time cost: 39166.789293289185 ms
meet same issue about loadding connection very slow. Did you slove this problem?
what problem are you facing? I don't think you are facing similar issue as we talk about. if loading collection is slow, usually you will need to check:
- how many segments you have, too many segments will cause load slow
- add more querynodes will help on improving load speed.
- maybe you will need check the tuning parameters if you have large pod size or what, we encourage you to share logs and pprof and explain what is your use cases so we can help
@xiaofan-luan Report my production environment: milvus verison: v2.3.1 mode:DISTRIBUTED, all nodes one pod Loaded Collections:552 All Collections:552 Entities:32,039 embedding size: 3072 Object store: OSS
now even load a empty collection cost a lot, here is my test script:
import random import time from pymilvus import ( connections, FieldSchema, CollectionSchema, DataType, Collection, utility, ) _HOST = 'xxx' _PORT = '19530' if __name__ == '__main__': connections.connect(host=_HOST, port=_PORT) dim = 512 collection_name = "loadtest" if utility.has_collection(collection_name): utility.drop_collection(collection_name) field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False) field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim) schema = CollectionSchema(fields=[field1, field2]) collection = Collection(name=collection_name, schema=schema) print("\ncollection created:", collection_name) index_param = { "index_type": "IVF_FLAT", "params": {"nlist": 256}, "metric_type": "L2"} collection.create_index("embedding", index_param) num = 10000 data = [ [i for i in range(num)], [[random.random() for _ in range(dim)] for _ in range(num)], ] ''' collection.insert(data) collection.flush() print("Insert", num, "vectors") print("Collection row count:", collection.num_entities) ''' start = time.time() collection.load() end = time.time() print("Load collection, time cost:", (end-start)*1000, "ms")script output: collection created: loadtest Load collection, time cost: 39166.789293289185 ms
try use 2.3.15 and see if improved. And, load is a one time ddl operation, 30s is not slow at all. though the collection is small, the load perf can be improved for sure. but when you work on large dataset, load is expected to be more than 10s
meet same issue about loadding connection very slow. Did you slove this problem?
what problem are you facing? I don't think you are facing similar issue as we talk about. if loading collection is slow, usually you will need to check:
- how many segments you have, too many segments will cause load slow
- add more querynodes will help on improving load speed.
- maybe you will need check the tuning parameters if you have large pod size or what, we encourage you to share logs and pprof and explain what is your use cases so we can help
@xiaofan-luan Report my production environment: milvus verison: v2.3.1 mode:DISTRIBUTED, all nodes one pod Loaded Collections:552 All Collections:552 Entities:32,039 embedding size: 3072 Object store: OSS now even load a empty collection cost a lot, here is my test script:
import random import time from pymilvus import ( connections, FieldSchema, CollectionSchema, DataType, Collection, utility, ) _HOST = 'xxx' _PORT = '19530' if __name__ == '__main__': connections.connect(host=_HOST, port=_PORT) dim = 512 collection_name = "loadtest" if utility.has_collection(collection_name): utility.drop_collection(collection_name) field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False) field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim) schema = CollectionSchema(fields=[field1, field2]) collection = Collection(name=collection_name, schema=schema) print("\ncollection created:", collection_name) index_param = { "index_type": "IVF_FLAT", "params": {"nlist": 256}, "metric_type": "L2"} collection.create_index("embedding", index_param) num = 10000 data = [ [i for i in range(num)], [[random.random() for _ in range(dim)] for _ in range(num)], ] ''' collection.insert(data) collection.flush() print("Insert", num, "vectors") print("Collection row count:", collection.num_entities) ''' start = time.time() collection.load() end = time.time() print("Load collection, time cost:", (end-start)*1000, "ms")script output: collection created: loadtest Load collection, time cost: 39166.789293289185 ms
try use 2.3.15 and see if improved. And, load is a one time ddl operation, 30s is not slow at all. though the collection is small, the load perf can be improved for sure. but when you work on large dataset, load is expected to be more than 10s
@xiaofan-luan But I try to test a docker compose STANDALONE mode milvus, the same load script just cost 3s which is 10 times fast to the DISTRIBUTED mode milvus.
meet same issue about loadding connection very slow. Did you slove this problem?
what problem are you facing? I don't think you are facing similar issue as we talk about. if loading collection is slow, usually you will need to check:
- how many segments you have, too many segments will cause load slow
- add more querynodes will help on improving load speed.
- maybe you will need check the tuning parameters if you have large pod size or what, we encourage you to share logs and pprof and explain what is your use cases so we can help
@xiaofan-luan Report my production environment: milvus verison: v2.3.1 mode:DISTRIBUTED, all nodes one pod Loaded Collections:552 All Collections:552 Entities:32,039 embedding size: 3072 Object store: OSS now even load a empty collection cost a lot, here is my test script:
import random import time from pymilvus import ( connections, FieldSchema, CollectionSchema, DataType, Collection, utility, ) _HOST = 'xxx' _PORT = '19530' if __name__ == '__main__': connections.connect(host=_HOST, port=_PORT) dim = 512 collection_name = "loadtest" if utility.has_collection(collection_name): utility.drop_collection(collection_name) field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False) field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim) schema = CollectionSchema(fields=[field1, field2]) collection = Collection(name=collection_name, schema=schema) print("\ncollection created:", collection_name) index_param = { "index_type": "IVF_FLAT", "params": {"nlist": 256}, "metric_type": "L2"} collection.create_index("embedding", index_param) num = 10000 data = [ [i for i in range(num)], [[random.random() for _ in range(dim)] for _ in range(num)], ] ''' collection.insert(data) collection.flush() print("Insert", num, "vectors") print("Collection row count:", collection.num_entities) ''' start = time.time() collection.load() end = time.time() print("Load collection, time cost:", (end-start)*1000, "ms")script output: collection created: loadtest Load collection, time cost: 39166.789293289185 ms
try use 2.3.15 and see if improved. And, load is a one time ddl operation, 30s is not slow at all. though the collection is small, the load perf can be improved for sure. but when you work on large dataset, load is expected to be more than 10s
@xiaofan-luan But I try to test a docker compose STANDALONE mode milvus, the same load script just cost 3s which is 10 times fast to the DISTRIBUTED mode milvus.
the implementation logic of standlaone and distributed is exactly the same. and they should same for most usecases. please offer logs if you need help on invetigating details.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.