pinecone-datasets
pinecone-datasets copied to clipboard
An open-source dataset library for pre-embedded dataset: create your own data catalog, or use Pinecone's public datasets.
### Is this a new bug? - [X] I believe this is a new bug - [X] I have searched the existing issues, and I could not find an existing...
### Is this a new bug? - [X] I believe this is a new bug - [X] I have searched the existing issues, and I could not find an existing...
## Problem Construction of the Catalog object currently takes ~7.1s to complete. This is significant as both list_datasets() and load_dataset() require the construction of a Catalog object; so essentially _any_...
## Problem We have at least one dataset which has inconsistencies - `langchain-python-docs-text-embedding-ada-002` has an extra duplicated .parquet file which means the dataset ends up with 2x the number of...
fix typo
## Problem fix typo ## Solution changed spelling ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change...
### Is this your first time submitting a feature request? - [X] I have searched the existing issues, and I could not find an existing issue for this feature -...
### Is this a new bug? - [X] I believe this is a new bug - [X] I have searched the existing issues, and I could not find an existing...
Checking if PR CI workflow fails without any code changes
## Problem The `should_create_index` was implemented like `do_create_index` instead of behaving like `allow_existing_index`, which is the original purpose. When the user is setting `should_create_index=False`, he doesn't necessarily mean "**don't** create...
@miararoy my bad, I missed this in #22. That this code should never have been merged - it breaks one of the key principles behind `pinecone-datasets` ## Problem One of...