qdrant-client
qdrant-client copied to clipboard
Please recommend various option values to upload 100 million data to qdrant.
Currently, I am engaged in a job related to natural language processing. The current qdrant can include the embedded vector value through various option values. But I keep getting the error "broken pipe" when I put in 100 million embedded vectors. The parallel value is 4. I use the upload_collection function when I put 100 million embedded vector data into qdrant. I would like to get a recommendation from you for various option values of this function. I would also like to hear an approximate answer about how long it takes to put in 100 million data. I look forward to your good reply.
[vector info]
- Number of embedding vectors: 100 million
- embedding vector dimension : 768
[hardware spec]
1.cpu core : 10 core 2.cpu : Intel(R) Core(TM) i9-10900X 3.ram Capacity: 251 GiB (Free: 106 GiB)
[upload_collection function Option value]
- payload = None
- parallel : 4
[collection info]
- distance: Cosine
- shard_number : 5
- memmap_threshold : 1,000,000,000
- indexing_threshold : 100,000,000
Hi. Did you decide on the final values? Correct me if I am wrong but won't setting a very high memmap_threshold value lead to issues because, till 1,000,000,000 Kb, qdrant will store all the vectors into memory, leading to very high RAM usage.