vector-io
vector-io copied to clipboard
Pinecone Import: Multiple matches for FieldRef.Name(__filename) in id: string
Attached is one of the parquet files generated from a Pinecone export. When I try to re-import I get these errors regarding duplicate fields.
Multiple matches for FieldRef.Name(__filename) in id: string vector: list<element: double> __filename: string __ingested_at: string content_id: string filename: string ingested_at: string text: string __fragment_index: int32 __batch_index: int32 __last_in_fragment: bool __filename: string
Thanks for reporting! Will have a look soon
I have pushed a potential fix to the export script (can't test since I'm not sure how the __filename is showing up in the vectors as well as metadata).
You can install the latest version of the package: vdf-io==0.1.232 and try exporting your dataset again. Please let me know if that works. Thanks
Well after exporting for 2 hours, it failed with the below. To get around my original problem, I modified the code to use JSON as the output since it is human readable and easy to fix. Worked great.
Final Step: Fetching vectors: 196it [00:03, 60.60it/s] | 0/1 [1:53:47<?, ?it/s]
Fetching namespaces: 100%|████████████████████████| 624/624 [1:53:48<00:00, 10.94s/it]
Fetching indexes: 100%|█████████████████████████████| 1/1 [1:53:49<00:00, 6829.88s/it]
Error: 1 validation error for VDFMeta | 0/78 [00:00<?, ?it/s]
authorStep: Fetching vectors: 100%|███████████████████| 78/78 [00:02<00:00, 33.79it/s]
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.6/v/string_type
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 64, in main
run_export(span)
File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 131, in run_export
export_obj = slug_to_export_func[args["vector_database"]](args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 118, in export_vdb
pinecone_export.get_data()
File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 448, in get_data
internal_metadata = VDFMeta(
^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 171, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VDFMeta
author
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.6/v/string_type
Final Step: Fetching vectors: 156it [00:02, 58.90it/s]