qdrant-client
qdrant-client copied to clipboard
`init_from is deprecated` warnings on create collection calls
I'm creating collections using the documented example as found on the official website documentation. However I constantly get warnings that init_from is deprecated and I wasn't even able to find it in the repo here.
from qdrant_client import QdrantClient, models
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="{collection_name}",
vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE),
init_from=models.InitFrom(collection="{from_collection_name}"),
)
I also see that init_from is supported on the API itself with no sign of deprecation soon.
I have no issues with the newly created collections, but the deprecation warning is a bit off-putting.
I'm using [email protected] and running the service in a container from the qdrant/qdrant:v1.9.5 image on x86_64 Linux
Hi @viktorku
We've indeed deprecated InitFrom functionality, since it might not work correctly if there are ongoing write operations in the source collection
Hence the warning
Thanks for the clarification @joein. Does it make sense to have a warning that communicates that? I'm currently using it only for migrations during downtime, and no ongoing operations will obstruct the "copy" process.
If the plan is to ultimately remove the init_from functionality altogether, then I'd have to resort to calling the API directly. But it wouldn't make sense to remove it from the client if the service still supports it...
Upping for quick question @joein
So is there alternative to that? So currently I am using it for merging different sources like:
Source A: Already have collection
Source B: Has embeddings (same kind) but no collection.
So I create Collection AB init_from A and .upload_points(B) to AB
Is this the right way?
My use case also involves copying a collection to a new one, while the old doesn't receive any writes. So in theory I would be ok using init_from. I would also like to know if there any plans of removing it. If so, what is the recommended approach going forward?
hey @fredtcaroli ! Could you please describe your use-case and why you explicitly need init_from on a regular basis?
Originally it was introduced to bypass lack of resharding and inability to change some of the collection settings. In the latest version of qdrant all data-independent settings can be changed without creating a new collection. And a copy of the collection can be achieved by snapshots. So what's the use for init_from?
Hey @generall, my use-case involves re-indexing collections of files on a scheduled basis (daily or hourly), and I'd like that to work atomically. Most files in the collection don't change between scheduled runs, so I don't need to re-embed/re-index them, hence the idea of copying a collection over and then modifying it.
My solution was to create a new collection with init_from (because I assumed that was more efficient than re-indexing everything) and use a collection alias to switch traffic when indexing is done.
I considered using snapshots, and from my readings I would need to:
- Get all the cluster nodes from the
/clusterendpoint - For each node I create and download a collection snapshot
- Re-upload the snapshots under a different collection name
- Update the collection alias
I used init_from because it sounded simpler 🤷
Maybe I'm missing a better approach here. I know it's recommended to have a single big collection instead of a bunch of small ones, but I can't figure out how to do this operation without re-indexing a bunch of points over and over in that scenario.