connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

PostgreSQL pgvector support

Open vc1492a opened this issue 6 months ago • 2 comments

Describe your feature request

Please support pgvector implementations of PostgreSQL so that we can continue leveraging ConnectorX without resorting to another library.

Recently, we had a customer try to leverage pgvector embedding columns within a PostgreSQL database (in Supabase) and encountered the following error:

2025-05-20 11:09:15 PDT.883 [INFO] Returning event stream for 1 tables
2025-05-20 11:09:15 PDT.884 [INFO] 127.0.0.1:58524 - "POST /v1/database/sample-tables HTTP/1.1" 200
2025-05-20 11:09:15 PDT.885 [INFO] Processing table: public.episodic_memories
2025-05-20 11:09:15 PDT.885 [INFO] Using sampling SQL: SELECT * FROM public.episodic_memories ORDER BY RANDOM() LIMIT 50
2025-05-20 11:09:15 PDT.885 [INFO] About to execute SQL for table public.episodic_memories using sample query: SELECT * FROM public.episodic_memories ORDER BY RANDOM() LIMIT 50
2025-05-20 11:09:15 PDT.885 [INFO] Testing sampling query execution directly
thread '<unnamed>' panicked at /Users/runner/work/connector-x/connector-x/connectorx/src/sources/postgres/typesystem.rs:109:22:
not implemented: vector
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-05-20 11:09:16 PDT.913 [ERROR] Exception in ASGI application
  + Exception Group Traceback (most recent call last):
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/sse_starlette/sse.py", line 237, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 772, in __aexit__
  |     raise BaseExceptionGroup(
  | BaseExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/sse_starlette/sse.py", line 240, in cancel_on_finish
    |     await coro()
    |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/sse_starlette/sse.py", line 159, in _stream_response
    |     async for data in self.body_iterator:
    |   File "/Users/kocienda/Mounts/nf/repo/dev/infactory_api/routes/route_helpers.py", line 168, in stream
    |     async for chunk in fn:
    |   File "/Users/kocienda/Mounts/nf/repo/dev/infactory_api/connectors/sql_connector.py", line 1130, in generate_data_streams
    |     test_df = cx.read_sql(modified_connection, sampling_sql, return_type="polars")
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/kocienda/Library/Caches/pypoetry/virtualenvs/infactory-Dh53eh-z-py3.12/lib/python3.12/site-packages/connectorx/__init__.py", line 409, in read_sql
    |     result = _read_sql(
    |              ^^^^^^^^^^
    | pyo3_runtime.PanicException: not implemented: vector
    +------------------------------------

I was able to replicate the issue using a local PostgreSQL container when trying to leverage ConnectorX:

frontend-1   |       code: 'UND_ERR_SOCKET',
api-1        |     |   File "/root/.cache/pypoetry/virtualenvs/infactory-9TtSrW0h-py3.12/lib/python3.12/site-packages/connectorx/__init__.py", line 409, in read_sql
frontend-1   |       socket: [Object]
api-1        |     |     result = _read_sql(
frontend-1   |     }
api-1        |     |              ^^^^^^^^^^
frontend-1   |   }
api-1        |     | RuntimeError: db error: ERROR: cannot cast type vector to double precision[]
frontend-1   | }
api-1        |     | 
api-1        |     | During handling of the above exception, another exception occurred:
api-1        |     | 
api-1        |     | Traceback (most recent call last):
api-1        |     |   File "/app/infactory_api/connectors/sql_connector.py", line 1335, in generate_data_streams
api-1        |     |     await upload_sql_and_sample_data(
api-1        |     |   File "/app/infactory_api/connectors/sql_connector.py", line 989, in upload_sql_and_sample_data
api-1        |     |     df = cx.read_sql(modified_connection, simplified_query, return_type="polars")
api-1        |     |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1        |     |   File "/root/.cache/pypoetry/virtualenvs/infactory-9TtSrW0h-py3.12/lib/python3.12/site-packages/connectorx/__init__.py", line 409, in read_sql
api-1        |     |     result = _read_sql(
api-1        |     |              ^^^^^^^^^^
api-1        |     | RuntimeError: db error: ERROR: cannot cast type vector to double precision[]
api-1        |     | 
api-1        |     | During handling of the above exception, another exception occurred:
api-1        |     | 
api-1        |     | Traceback (most recent call last):
api-1        |     |   File "/root/.cache/pypoetry/virtualenvs/infactory-9TtSrW0h-py3.12/lib/python3.12/site-packages/sse_starlette/sse.py", line 240, in cancel_on_finish
api-1        |     |     await coro()
api-1        |     |   File "/root/.cache/pypoetry/virtualenvs/infactory-9TtSrW0h-py3.12/lib/python3.12/site-packages/sse_starlette/sse.py", line 159, in _stream_response
api-1        |     |     async for data in self.body_iterator:
api-1        |     |   File "/app/infactory_api/routes/route_helpers.py", line 168, in stream
api-1        |     |     async for chunk in fn:
api-1        |     |   File "/app/infactory_api/connectors/sql_connector.py", line 1365, in generate_data_streams
api-1        |     |     raise HTTPException(
api-1        |     | fastapi.exceptions.HTTPException: 500: Error sampling table public.high_dim_vectors: db error: ERROR: cannot cast type vector to double precision[]
api-1        |     +------------------------------------
frontend-1   | [Middleware] Checking redirect for path: /api/infactory/v1/datasources/4239b1a3-99b9-439a-a5d2-f66c0e937e28/with_datalines
frontend-1   | [Middleware] Found platform_id: a9fbb528-64b8-4019-833c-268ff7b09f84
frontend-1   | [SERVER] API request: {
frontend-1   |   method: 'GET',
frontend-1   |   url: 'http://api:8000/v1/datasources/4239b1a3-99b9-439a-a5d2-f66c0e937e28/with_datalines'
frontend-1   | }
api-1        | 2025-05-21 18:33:42 UTC.430 [INFO] HTTP Request: POST http://localhost:38179/ "HTTP/1.1 200 OK"
api-1        | 2025-05-21 18:33:42 UTC.433 [INFO] HTTP Request: POST http://localhost:38179/ "HTTP/1.1 200 OK"
api-1        | 2025-05-21 18:33:42 UTC.434 [INFO] HTTP Request: POST http://localhost:38179/ "HTTP/1.1 200 OK"
api-1        | 2025-05-21 18:33:42 UTC.436 [INFO] HTTP Request: POST http://localhost:38179/ "HTTP/1.1 200 OK"
api-1        | 2025-05-21 18:33:42 UTC.437 [INFO] 172.18.0.3:47612 - "GET /v1/datasources/4239b1a3-99b9-439a-a5d2-f66c0e937e28/with_datalines HTTP/1.1" 200

It would be great if ConnectorX supported pgvector columns, as I now had to include an additional SQL library in our code base (psycopg) simply to support pgvector.

vc1492a avatar May 27 '25 21:05 vc1492a

Thanks @holicc for adding support for pgvector! I have release an alpha version: pip install connectorx==0.4.4a1, please feel free to try it out!

wangxiaoying avatar Jun 10 '25 23:06 wangxiaoying

@wangxiaoying @holicc can confirm from our end that this is working ✅

vc1492a avatar Jun 17 '25 21:06 vc1492a