snowflake-sqlalchemy icon indicating copy to clipboard operation
snowflake-sqlalchemy copied to clipboard

SNOW-1374015: 🐛 Snowflake SQLAlchemy Driver fails when reflecting new `VECTOR` data type

Open aaronsteers opened this issue 1 year ago • 8 comments
trafficstars

Symptom trying to fix?

When reflecting a SQL table, failures will raise if using the (very new!) VECTOR data type.

What did you expect to see?

I think this driver needs to be updated to handle VECTOR type. Internally, this is basically an array of floats, except that the number of items in the array is fixed at creation time.

aaronsteers avatar May 09 '24 21:05 aaronsteers

hello and thank you for the interest of the public preview feature of VECTOR datatype! as documented on the feature page, currently it is

[..]only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

We'll work on adding support in other Snowflake drivers and connectors and thank you for bearing with us while this happens.

sfc-gh-dszmolka avatar May 10 '24 12:05 sfc-gh-dszmolka

@sfc-gh-dszmolka - Yes, that makes sense. Our workaround for now is to pass certain commands through the Snowflake Python client - but it would be beneficial long-term to switch back to the native SQLAlchemy integrations - and also to at least make sure SQLAlchemy does not break when attempting to scan or read from those tables.

Happy to use this issue as a tracking item for that future work. Thanks for your support.

aaronsteers avatar May 10 '24 16:05 aaronsteers

Hey @sfc-gh-dszmolka. As this feature is now out of public preview, do you know the status of this issue? Thanks!

japborst avatar Aug 15 '24 10:08 japborst

hi @japborst unfortunately at this moment, I don't have any additional info on the timeline for the implementation, but trying to get it from the team and will update this issue when/if I have any news. Thank you all for bearing with us !

sfc-gh-dszmolka avatar Aug 15 '24 15:08 sfc-gh-dszmolka

Hey @sfc-gh-dszmolka!

On the website I read

The VECTOR data type is only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

Do I read correctly that whilst there is Python support (the Python snowflake connector), it's primarily SQLAlchemy support that is missing?

japborst avatar Sep 03 '24 10:09 japborst

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

tazhigaliyev avatar Sep 19 '24 12:09 tazhigaliyev

Is there any ETA on this? It makes this quite hard to connect to sqlalchemy based solutions such as SuperSet.

MattLJoslin avatar Oct 03 '24 21:10 MattLJoslin

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

FWIW: I've used a similar workaround, although it won't help when SQLAlchemy is wrapped by another tool (like Superset in the comment above). Would be great to see native support added.

aaronsteers avatar Oct 03 '24 22:10 aaronsteers