Use SQLAlchemy for BigQuery backend
@tswast I'd like to know your opinion on this.
Seems to me, that the Ibis SQL compiler (i.e. ibis/backends/base/sql/compiler) is basically reinventing SQLAlchemy. SQLAlchemy is great at generating an SQL string given expressions, and that's exactly what the compiler also does. And now we've got our backends fragmented in the ones that use the Ibis compiler, and the ones using SQLAlchemy.
Seems to me, that for the future would make much more sense to exclusively use SQLAlchemy for the generation of SQL, and remove from Ibis all the code in ibis/backends/base/sql/compiler). That would involve that all backends use their SQLAlchemy engine for Ibis. For BigQuery this seems to be trivial, as there is already a BigQuery SQLAlchemy plugin, and implementing a SQLAlchemy backend is quite straight-forward (and even more after the refactoring I'm finishing).
Is this something that makes sense to you? Have you considered this already? Anything am I missing?
For BigQuery this makes sense, especially now that we've done some pretty extensive testing of the SQLAlchemy connector for BigQuery.
My concerns:
- Maybe it'll be harder to share code between the BigQuery and Spanner Ibis connector (https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/third_party/ibis/ibis_cloud_spanner). Spanner is significantly more difficult to create a SQLAlchemy connector for because the backend has quite a more rich (any) transaction support compared to BigQuery.
- I recall that there are some other Ibis backends in https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/third_party/ibis that went with the Ibis-based string generation because a SQLAlchemy connector did not exist or was broken. (I think Teradata? Maybe Oracle or IBM too)
I guess another concern is UDFs. I don't believe we have support for this in the SQLAlchemy connector (though it would be cool to have!)
Thanks for all the information, that's very useful to know. We clearly need to plan this well, if it ever happens. OmniSci seems to be happy with the idea, and I think for clickhouse is very doable. Impala is trickier, and we'll have to see if it can be done, or if it makes sense.
I'll be soon writing a draft of an Ibis roadmap with future plans, so we can discuss them. But I think after 2.0 we can make the base SQLAlchemy not depend on the Ibis compiler to start with. And maybe we can later move to separate projects / extensions the Ibis compiler, and the backends that use them. Then we'll see whether it's worth to maintain the Ibis compiler, or if backends are eventually migrated to SQLAlchemy.
We can discuss further after 2.0, when I have the draft for the roadmap. For now all your comments were very useful to know if researching in this direction made sense or not. Thanks!
Another thought: I'd still want some custom code to execute the query text once compiled & transform to pandas dataframe. SQLAlchemy -> pandas is possible with just the SQLAlchemy connector, but it's much less performant.
Another thought: I'd still want some custom code to execute the query text once compiled & transform to pandas dataframe. SQLAlchemy -> pandas is possible with just the SQLAlchemy connector, but it's much less performant.
This make sense. I think the part that really adds value to replace is the generation of the SQL, which if we only use SQLAlchemy it will let us delete few thousands lines of code in Ibis.
Somehow unrelated, there were some discussions on fetching data from backends as Arrow (which can easily converted to pandas with a .to_pandas() call). This is unrelated to SQLAlchemy, and open to discusssion, but I think that should be an improvement.
Some news: we now have a 1.0 (General Availability) release of the official SQLAlchemy connector for BigQuery (https://github.com/googleapis/python-bigquery-sqlalchemy)
After that and my failed attempt at migration in https://github.com/ibis-project/ibis-bigquery/pull/91, I'm thinking maybe it is time to migrate.
I don't want to abandon the Ibis 1.x users though, so perhaps we create a v2 branch for now and keep the old code paths around. I think it will take time for tools like https://github.com/GoogleCloudPlatform/professional-services-data-validator to migrate to Ibis 2.0 once its available, especially since there are several custom backends in https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/third_party/ibis that have yet to be split out.