vector-io icon indicating copy to clipboard operation
vector-io copied to clipboard

Sweep: The export from pinecone fails due to some data type error

Open actuallyabhi opened this issue 1 year ago โ€ข 3 comments

Details

Fetching namespaces: 0% 0/1 [02:54<?, ?it/s] Error: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 89, in main run_export(span) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 149, in run_export export_obj = slug_to_export_func[args["vector_database"]](args) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 164, in export_vdb pinecone_export.get_data() File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 481, in get_data index_meta = self.get_data_for_index(index_name) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 575, in get_data_for_index total_size += self.save_vectors_to_parquet( File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/vdb_export_cls.py", line 87, in save_vectors_to_parquet df.to_parquet(parquet_file) File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 2970, in to_parquet return to_parquet( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 483, in to_parquet impl.write( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 189, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays arrays[i] = maybe_fut.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 340, in pyarrow.lib.array File "pyarrow/array.pxi", line 86, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Exporting fluidaigpt-dev: 0% 0/1 [02:56<?, ?it/s] Final Step: Fetching vectors: 100% 14404/14404 [02:39<00:00, 90.24it/s]

Branch

No response

actuallyabhi avatar Jul 19 '24 12:07 actuallyabhi

Sweeping

0%

๐Ÿ’Ž Sweep Pro: You have unlimited Sweep issues

Actions

  • [ ] โ†ป Restart Sweep

Step 1: ๐Ÿ”Ž Searching

I'm searching for relevant snippets in your repository. If this is your first time using Sweep, I'm indexing your repository, which will take a few minutes.


[!TIP] To recreate the pull request, edit the issue title or description.

To resolve the data type error during export from Pinecone, modify the save_vectors_to_parquet method in /src/vdf_io/export_vdf/vdb_export_cls.py to ensure the created_at column is converted to a double. Add the following code before calling df.to_parquet(parquet_file):

if 'created_at' in df.columns:
    df['created_at'] = df['created_at'].astype(float)

This will convert the created_at column to the correct data type.

References

/.github/ISSUE_TEMPLATE /src/vdf_io/import_vdf/pinecone_import.py /src/vdf_io/export_vdf_cli.py /archive/example-VDF_META.json /src/vdf_io/import_vdf /src/vdf_io/notebooks/test_filtering_pc_log.ipynb /src/vdf_io/notebooks /src/vdf_io/import_vdf_cli.py /docs/export_vdf_pinecone_help.txt /.github/ISSUE_TEMPLATE/sweep-template.yml /.github/ISSUE_TEMPLATE/support-for-new-vector-db.md /src/vdf_io/export_vdf/vertexai_vector_search_export.py /.github /src/vdf_io /src/vdf_io/scripts/push_to_hub_vdf.py /docs /src /src/vdf_io/export_vdf/pinecone_export.py /archive /src/vdf_io/notebooks/kdbai_end_to_end_vectorIO.ipynb /README.md /src/vdf_io/export_vdf/vdb_export_cls.py

About Greptile

This response provides a starting point for your research, not a precise solution.

Help us improve! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant.

Ask Greptile ยท Edit Issue Bot Settings

greptile-apps[bot] avatar Jul 19 '24 12:07 greptile-apps[bot]

What type is the created at column in your original index?

dhruv-anand-aintech avatar Jul 19 '24 12:07 dhruv-anand-aintech