vector-io icon indicating copy to clipboard operation
vector-io copied to clipboard

Pinecone Import: Multiple matches for FieldRef.Name(__filename) in id: string

Open chriscow opened this issue 1 year ago • 3 comments

Attached is one of the parquet files generated from a Pinecone export. When I try to re-import I get these errors regarding duplicate fields.

Multiple matches for FieldRef.Name(__filename) in id: string vector: list<element: double> __filename: string __ingested_at: string content_id: string filename: string ingested_at: string text: string __fragment_index: int32 __batch_index: int32 __last_in_fragment: bool __filename: string

i2.parquet.zip

chriscow avatar Mar 29 '24 14:03 chriscow

Thanks for reporting! Will have a look soon

dhruv-anand-aintech avatar Mar 30 '24 20:03 dhruv-anand-aintech

I have pushed a potential fix to the export script (can't test since I'm not sure how the __filename is showing up in the vectors as well as metadata).

You can install the latest version of the package: vdf-io==0.1.232 and try exporting your dataset again. Please let me know if that works. Thanks

dhruv-anand-aintech avatar Mar 30 '24 20:03 dhruv-anand-aintech

Well after exporting for 2 hours, it failed with the below. To get around my original problem, I modified the code to use JSON as the output since it is human readable and easy to fix. Worked great.

Final Step: Fetching vectors: 196it [00:03, 60.60it/s]        | 0/1 [1:53:47<?, ?it/s]
Fetching namespaces: 100%|████████████████████████| 624/624 [1:53:48<00:00, 10.94s/it]
Fetching indexes: 100%|█████████████████████████████| 1/1 [1:53:49<00:00, 6829.88s/it]
Error: 1 validation error for VDFMeta                          | 0/78 [00:00<?, ?it/s]
authorStep: Fetching vectors: 100%|███████████████████| 78/78 [00:02<00:00, 33.79it/s]
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.6/v/string_type
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 64, in main
    run_export(span)
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 131, in run_export
    export_obj = slug_to_export_func[args["vector_database"]](args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 118, in export_vdb
    pinecone_export.get_data()
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 448, in get_data
    internal_metadata = VDFMeta(
                        ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VDFMeta
author
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.6/v/string_type
Final Step: Fetching vectors: 156it [00:02, 58.90it/s]                                

chriscow avatar Apr 23 '24 18:04 chriscow