polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(python): Add support for `async` SQLAlchemy connections to `read_database`

Open alexander-beedie opened this issue 1 year ago • 3 comments

Ref: https://stackoverflow.com/questions/77166897

Database connectivity continues to expand; with this PR we can now seamlessly init a DataFrame from async SQLAlchemy Connection, Engine, and Session objects.

Added async unit test coverage using aiosqlite (SQLite), but additionally validated the PR locally using asyncpg (PostgreSQL) and aioodbc (ODBC) drivers.

Examples

SQL Server over ODBC (async driver):

from sqlalchemy.ext.asyncio import create_async_engine
import polars as pl

odbc_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost"
async_conn = create_async_engine(
    f"mssql+aioodbc:///?odbc_connect={odbc_string}"
)
pl.read_database(
    query = "SELECT optname,value FROM master.dbo.MSreplication_options",
    connection = async_conn,
)
# shape: (3, 2)
# ┌────────────────┬───────┐
# │ optname        ┆ value │
# │ ---            ┆ ---   │
# │ str            ┆ bool  │
# ╞════════════════╪═══════╡
# │ transactional  ┆ true  │
# │ merge          ┆ true  │
# │ security_model ┆ true  │
# └────────────────┴───────┘

SQLite (async driver):

async_conn = create_async_engine("sqlite+aiosqlite:///chinook.db")

pl.read_database( 
    query = """
      SELECT l.Title AS album_name, r.Name AS artist
      FROM albums l INNER JOIN artists r ON r.ArtistId = l.ArtistId
      ORDER BY l.Title LIMIT 4
    """,
    connection = async_conn,
)
# shape: (4, 2)
# ┌────────────────────────────────┬────────────────────────────────┐
# │ album_name                     ┆ artist                         │
# │ ---                            ┆ ---                            │
# │ str                            ┆ str                            │
# ╞════════════════════════════════╪════════════════════════════════╡
# │ ...And Justice For All         ┆ Metallica                      │
# │ 20th Century Masters - The Mi… ┆ Scorpions                      │
# │ A Copland Celebration, Vol. I  ┆ Aaron Copland & London Sympho… │
# │ A Matter of Life and Death     ┆ Iron Maiden                    │
# └────────────────────────────────┴────────────────────────────────┘

Also

The "execute" method of the internal ConnectionExecutor abstraction was getting a bit long, so factored out the SQLAlchemy setup block into its own method for clarity. Next commit will likely be a larger-scale (and overdue ;) refactor of all the database code into its own "io.database" subdirectory to facilitate easier maintenance.

alexander-beedie avatar Mar 19 '24 12:03 alexander-beedie

Looks like some slight differences in the latest async behaviour; the new unit tests pass on Python 3.8 and 3.11, but not 3.12 - will investigate 🤔

alexander-beedie avatar Mar 19 '24 13:03 alexander-beedie

... will investigate

Got it; all tests passing on all platforms now :sunglasses:

alexander-beedie avatar Mar 19 '24 15:03 alexander-beedie

Codecov Report

Attention: Patch coverage is 80.95238% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 81.18%. Comparing base (2250055) to head (0502c8d). Report is 2 commits behind head on main.

:exclamation: Current head 0502c8d differs from pull request most recent head 35307a8. Consider uploading reports for the commit 35307a8 to get more accurate results

Files Patch % Lines
py-polars/polars/io/database.py 81.81% 10 Missing and 4 partials :warning:
py-polars/polars/datatypes/convert.py 71.42% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15162   +/-   ##
=======================================
  Coverage   81.18%   81.18%           
=======================================
  Files        1347     1347           
  Lines      175423   175467   +44     
  Branches     2506     2516   +10     
=======================================
+ Hits       142410   142449   +39     
- Misses      32533    32536    +3     
- Partials      480      482    +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 19 '24 15:03 codecov[bot]

I often find myself moving functions around when refactoring, which is not ideal for getting clean git diffs/history. Are you very attached to your current modus operandi?

I guess I think the other way round - if you're going to use something, it should already have been defined, so you will know it's available and have seen it coming (rather than using a newspaper analogy I find it more like a house; you don't build a house from the top-floor down, you build it from the foundations up 🤣)

However, I'm about to make that difference largely go away here as my follow-up PR (already ready to go!) refactors the database code into a clean io/database/... structure, with utility functions/classes all separated into their own private modules, which I suspect we both prefer, rather than having everything inlined in one large file! 👍

alexander-beedie avatar Mar 21 '24 08:03 alexander-beedie

Sounds good!

stinodego avatar Mar 21 '24 08:03 stinodego

Sounds good!

Here we go: https://github.com/pola-rs/polars/pull/15201 Much more manageable/organised 😎

alexander-beedie avatar Mar 21 '24 08:03 alexander-beedie

@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think? https://github.com/pola-rs/polars/actions/runs/8372713122/job/22924229816?pr=14765

stinodego avatar Mar 21 '24 09:03 stinodego

@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think?

Will have a look in a few minutes!

alexander-beedie avatar Mar 21 '24 09:03 alexander-beedie

after installing "nest_asyncio" i finally got the engine to work with it. but due to reasons of setting variables and schema etc first i need to use the "session" instead. i cant get it working with the session. pls sir.. can i have some help - puppy dog eyes -

WilliamStam avatar May 31 '24 08:05 WilliamStam

polars.read_database can use async connections as of this PR but it seems that the function itself is still not async #20339. Is this correct?

epetrovski avatar May 09 '25 10:05 epetrovski