pinecone-datasets
pinecone-datasets copied to clipboard
[Feature] Add asyncio support
Is this your first time submitting a feature request?
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [ ] I am requesting a straightforward extension of existing functionality
Describe the feature
Currently, load_dataset, list_datasets and to_pinecone_index functions are not async. These are potentially long running functions that might block the main thread for most asyncio applications. The goal is to add support for async equivalent of these functions.
list_datasets: gcsfs and s3fs are async compatible so it should be relatively easy to add async equivalents.
to_pinecone_index: Might require Pinecone Client 3.0 so we might need to wait until it is stable.
load_dataset: We need to improve the functionality here. Currently load_dataset does not actually load the dataset but just creates a Dataset object that might be confusing for the users. Long running tasks should be clear to the user and download should be explicit. (Currently download happens on property access to queries/documents or by calling head function.) See https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris as an example.
In order to change this functionality I suggest changing load_dataset to get_dataset_loader (or another name) and creating two functions to fetch queries or documents such as dataset_loader.load_documents (async) and dataset_loader.load_queries (async). In that case we might need to deprecate load_dataset but keep several versions with a DeprecationWarning. We might also need some refactor.
Describe alternatives you've considered
We can keep the API as is but as asyncio is becoming more and more popular I think it is a good idea to catch up.
Who will this benefit?
to_pinecone_index_async will be especially useful for big bulk upserts. The other changes will improve the user experience.
Are you interested in contributing this feature?
Sure, I think we need to have a discussion first and plan the changes properly.
Anything else?
No response