qdrant-client icon indicating copy to clipboard operation
qdrant-client copied to clipboard

`init_from is deprecated` warnings on create collection calls

Open viktorku opened this issue 1 year ago • 3 comments

I'm creating collections using the documented example as found on the official website documentation. However I constantly get warnings that init_from is deprecated and I wasn't even able to find it in the repo here.

from qdrant_client import QdrantClient, models

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="{collection_name}",
    vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE),
    init_from=models.InitFrom(collection="{from_collection_name}"),
)

I also see that init_from is supported on the API itself with no sign of deprecation soon.

I have no issues with the newly created collections, but the deprecation warning is a bit off-putting.

I'm using [email protected] and running the service in a container from the qdrant/qdrant:v1.9.5 image on x86_64 Linux

viktorku avatar Jun 18 '24 14:06 viktorku

Hi @viktorku

We've indeed deprecated InitFrom functionality, since it might not work correctly if there are ongoing write operations in the source collection Hence the warning

joein avatar Jun 20 '24 10:06 joein

Thanks for the clarification @joein. Does it make sense to have a warning that communicates that? I'm currently using it only for migrations during downtime, and no ongoing operations will obstruct the "copy" process.

If the plan is to ultimately remove the init_from functionality altogether, then I'd have to resort to calling the API directly. But it wouldn't make sense to remove it from the client if the service still supports it...

viktorku avatar Jun 21 '24 10:06 viktorku

Upping for quick question @joein

So is there alternative to that? So currently I am using it for merging different sources like:

Source A: Already have collection

Source B: Has embeddings (same kind) but no collection.

So I create Collection AB init_from A and .upload_points(B) to AB

Is this the right way?

dre5ib avatar Feb 13 '25 07:02 dre5ib

My use case also involves copying a collection to a new one, while the old doesn't receive any writes. So in theory I would be ok using init_from. I would also like to know if there any plans of removing it. If so, what is the recommended approach going forward?

fredtcaroli avatar Apr 07 '25 01:04 fredtcaroli

hey @fredtcaroli ! Could you please describe your use-case and why you explicitly need init_from on a regular basis? Originally it was introduced to bypass lack of resharding and inability to change some of the collection settings. In the latest version of qdrant all data-independent settings can be changed without creating a new collection. And a copy of the collection can be achieved by snapshots. So what's the use for init_from?

generall avatar Apr 07 '25 07:04 generall

Hey @generall, my use-case involves re-indexing collections of files on a scheduled basis (daily or hourly), and I'd like that to work atomically. Most files in the collection don't change between scheduled runs, so I don't need to re-embed/re-index them, hence the idea of copying a collection over and then modifying it.

My solution was to create a new collection with init_from (because I assumed that was more efficient than re-indexing everything) and use a collection alias to switch traffic when indexing is done.

I considered using snapshots, and from my readings I would need to:

  • Get all the cluster nodes from the /cluster endpoint
  • For each node I create and download a collection snapshot
  • Re-upload the snapshots under a different collection name
  • Update the collection alias

I used init_from because it sounded simpler 🤷

Maybe I'm missing a better approach here. I know it's recommended to have a single big collection instead of a bunch of small ones, but I can't figure out how to do this operation without re-indexing a bunch of points over and over in that scenario.

fredtcaroli avatar Apr 09 '25 21:04 fredtcaroli