feat(python): Add support for `async` SQLAlchemy connections to `read_database`
Ref: https://stackoverflow.com/questions/77166897
Database connectivity continues to expand; with this PR we can now seamlessly init a DataFrame from async SQLAlchemy Connection, Engine, and Session objects.
Added async unit test coverage using aiosqlite (SQLite), but additionally validated the PR locally using asyncpg (PostgreSQL) and aioodbc (ODBC) drivers.
Examples
SQL Server over ODBC (async driver):
from sqlalchemy.ext.asyncio import create_async_engine
import polars as pl
odbc_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost"
async_conn = create_async_engine(
f"mssql+aioodbc:///?odbc_connect={odbc_string}"
)
pl.read_database(
query = "SELECT optname,value FROM master.dbo.MSreplication_options",
connection = async_conn,
)
# shape: (3, 2)
# ┌────────────────┬───────┐
# │ optname ┆ value │
# │ --- ┆ --- │
# │ str ┆ bool │
# ╞════════════════╪═══════╡
# │ transactional ┆ true │
# │ merge ┆ true │
# │ security_model ┆ true │
# └────────────────┴───────┘
SQLite (async driver):
async_conn = create_async_engine("sqlite+aiosqlite:///chinook.db")
pl.read_database(
query = """
SELECT l.Title AS album_name, r.Name AS artist
FROM albums l INNER JOIN artists r ON r.ArtistId = l.ArtistId
ORDER BY l.Title LIMIT 4
""",
connection = async_conn,
)
# shape: (4, 2)
# ┌────────────────────────────────┬────────────────────────────────┐
# │ album_name ┆ artist │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════════════════════════════╪════════════════════════════════╡
# │ ...And Justice For All ┆ Metallica │
# │ 20th Century Masters - The Mi… ┆ Scorpions │
# │ A Copland Celebration, Vol. I ┆ Aaron Copland & London Sympho… │
# │ A Matter of Life and Death ┆ Iron Maiden │
# └────────────────────────────────┴────────────────────────────────┘
Also
The "execute" method of the internal ConnectionExecutor abstraction was getting a bit long, so factored out the SQLAlchemy setup block into its own method for clarity. Next commit will likely be a larger-scale (and overdue ;) refactor of all the database code into its own "io.database" subdirectory to facilitate easier maintenance.
Looks like some slight differences in the latest async behaviour; the new unit tests pass on Python 3.8 and 3.11, but not 3.12 - will investigate 🤔
... will investigate
Got it; all tests passing on all platforms now :sunglasses:
Codecov Report
Attention: Patch coverage is 80.95238% with 16 lines in your changes are missing coverage. Please review.
Project coverage is 81.18%. Comparing base (
2250055) to head (0502c8d). Report is 2 commits behind head on main.
:exclamation: Current head 0502c8d differs from pull request most recent head 35307a8. Consider uploading reports for the commit 35307a8 to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| py-polars/polars/io/database.py | 81.81% | 10 Missing and 4 partials :warning: |
| py-polars/polars/datatypes/convert.py | 71.42% | 1 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #15162 +/- ##
=======================================
Coverage 81.18% 81.18%
=======================================
Files 1347 1347
Lines 175423 175467 +44
Branches 2506 2516 +10
=======================================
+ Hits 142410 142449 +39
- Misses 32533 32536 +3
- Partials 480 482 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I often find myself moving functions around when refactoring, which is not ideal for getting clean git diffs/history. Are you very attached to your current modus operandi?
I guess I think the other way round - if you're going to use something, it should already have been defined, so you will know it's available and have seen it coming (rather than using a newspaper analogy I find it more like a house; you don't build a house from the top-floor down, you build it from the foundations up 🤣)
However, I'm about to make that difference largely go away here as my follow-up PR (already ready to go!) refactors the database code into a clean io/database/... structure, with utility functions/classes all separated into their own private modules, which I suspect we both prefer, rather than having everything inlined in one large file! 👍
Sounds good!
Sounds good!
Here we go: https://github.com/pola-rs/polars/pull/15201 Much more manageable/organised 😎
@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think? https://github.com/pola-rs/polars/actions/runs/8372713122/job/22924229816?pr=14765
@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think?
Will have a look in a few minutes!
after installing "nest_asyncio" i finally got the engine to work with it. but due to reasons of setting variables and schema etc first i need to use the "session" instead. i cant get it working with the session. pls sir.. can i have some help - puppy dog eyes -
polars.read_database can use async connections as of this PR but it seems that the function itself is still not async #20339. Is this correct?