Few questions on Milvus when running LAION 100M large dataset
I am using VectorDBbench to perform and analyse the milvus capabilities before it handle our load at scale. ?
We are using 1 server with 4 NVIDIA L40S gpus and i have assigned 2 for querynode and 2 for indexnode.
When i ran the no filter search performance test index LAION 100M dataset and index type is DISKANN with K=100 and the entire setup just hung for like hours together in optmize state and wondering there is no more logs to see whats happening in this state ?
Few questions :
- What does optimise state really does in this case. ?
- Enlighten me here, i am thinking GPUs are no role to play in the optimize state.
- How long could we expect to this test to complete any rough ideas. ?
- Does the query and index gpus only play role , when there is any indexing happening and query happenings and how to check the GPU usage ? (I tried the nvidimia-smi) and there is no usage i observed.
Last status
#####. Data set #####
2024-08-30 13:05:46,369 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'Milvus-r1u1'}, drop_old=True (interface.py:164) (2145320)
##### Current state #####
2024-08-31 05:00:37,072 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=48396.15328549099 (serial_runner.py:61) (2584643)
2024-08-31 05:00:38,764 | INFO: Milvus post insert before optimize (milvus.py:101) (982732)
my helm values.yaml file looks like this :
` indexNode: resources: requests: nvidia.com/gpu: "2" limits: nvidia.com/gpu: "2" queryNode: resources: requests: nvidia.com/gpu: "2" limits: nvidia.com/gpu: "2" mmap: # Set memory mapping property for whole cluster mmapEnabled: true # Set memory-mapped directory path, if you leave mmapDirPath unspecified, the memory-mapped files will be stored in {localStorage.path}/ mmap by default. mmapDirPath: /mnt/vector/clustersetup_files/
minio: enabled: false
externalS3: enabled: true host: "xx..xx.xxx.xx" port: "xx" accessKey: "mykey" secretKey: "myskey" useSSL: false bucketName: "milvusdb" rootPath: "" useIAM: false cloudProvider: "aws" iamEndpoint: ""`
gpu won't work in your case, unless you are using gpu index. but for your case I guess gpu memory won't be enough.
We are working on a mode build with GPU and search with CPu but it's not there yet
bulkinsert and index 100M data usually takes couple of hours. each search could take hundred milliseconds so It kindly depend on how much queries you want to run
@agandra30
For milvus, optimize refers primarily to compaction - manually allows milvus to consolidate various fragmented segments into larger ones, which improves query performance.
What does optimise state really does in this case. ?
This depends on the performance of your machine. Note that you have chosen DiskANN, one of the CPU index types that do not utilize any GPU resources; therefore, performance relies solely on CPU capabilities. Specifically, if you are using Milvus in standalone mode, it depends on the number of CPUs available. If you are using Milvus in cluster mode, it depends on the number of CPUs allocated to the index node.
How long could we expect to this test to complete any rough ideas. ?
Thank you @xiaofan-luan and @alwayslove2013 for your replies . Appreciate your support . Is there any way i can use the 100M dataset for filter search for "filtering search 1% and 99%" test case . It is restricted to only 10M dataset , how can i use that option on the UI for the same LAION 100M dataset ?
I used the same LAION dataset as custom dataset, but it only let me do the no-filter search performance test , but not 1% and 99%. test which i am looking to perform . is there anyway you guide us. ?
In the filtering test case 1% and 99% is the search happens serially or concurrently ?
How to check my mmap configuration given above in my deployment state is being used instead of local disk space ?
gpu won't work in your case, unless you are using gpu index. but for your case I guess gpu memory won't be enough.
We are working on a mode build with GPU and search with CPu but it's not there yet
Thanks @xiaofan-luan for the reply , when you meant that the memory won't enough, you mean the 48GB (46068MiB)/GPU == 48x4 == 192 GB is not sufficient to process a 100Million Dataset ? . I am assuming that because we are only using gpus for processing nothing to store correct ?
Is GPU_CAGRA recommended instead of HSNW ?
@agandra30 Currently, VectorDBBench does not support filtering cases with Laion 100M.
The main reason is that the 100M dataset is quite large (~300G), and the cost of computing the ground truth is relatively high. Therefore, we have not prepared ground truth for the filtering cases.
Is there any way i can use the 100M dataset for filter search for "filtering search 1% and 99%" test case