qdrant-client icon indicating copy to clipboard operation
qdrant-client copied to clipboard

Please recommend various option values to upload 100 million data to qdrant.

Open hyunmokky opened this issue 1 year ago • 1 comments

Currently, I am engaged in a job related to natural language processing. The current qdrant can include the embedded vector value through various option values. But I keep getting the error "broken pipe" when I put in 100 million embedded vectors. The parallel value is 4. I use the upload_collection function when I put 100 million embedded vector data into qdrant. I would like to get a recommendation from you for various option values of this function. I would also like to hear an approximate answer about how long it takes to put in 100 million data. I look forward to your good reply.

[vector info]

  1. Number of embedding vectors: 100 million
  2. embedding vector dimension : 768

[hardware spec]

1.cpu core : 10 core 2.cpu : Intel(R) Core(TM) i9-10900X 3.ram Capacity: 251 GiB (Free: 106 GiB)

[upload_collection function Option value]

  1. payload = None
  2. parallel : 4

[collection info]

  1. distance: Cosine
  2. shard_number : 5
  3. memmap_threshold : 1,000,000,000
  4. indexing_threshold : 100,000,000

hyunmokky avatar Mar 09 '23 08:03 hyunmokky

Hi. Did you decide on the final values? Correct me if I am wrong but won't setting a very high memmap_threshold value lead to issues because, till 1,000,000,000 Kb, qdrant will store all the vectors into memory, leading to very high RAM usage.

AdirthaBorgohain avatar Mar 14 '23 06:03 AdirthaBorgohain