crate-python icon indicating copy to clipboard operation
crate-python copied to clipboard

[Feature] async support

Open robd003 opened this issue 1 year ago • 3 comments

I'm doing a lot of bulk inserts that are generated from user activity. (Some times we have 10 rows, other times we have 100,000 rows but broken up into blocks of 1,000)

It would be really helpful to have async support so that we don't have issues of blocking while waiting for Crate to ingest 1,000 rows at a time.

robd003 avatar May 22 '24 06:05 robd003

Dear Robert,

thank you for writing in. Are you looking at asnyc support for the HTTP driver, or async support for the SQLAlchemy dialect?

In general, you can always use the asyncpg or psycopg3 libraries, as outlined on this documentation page.

  • https://cratedb.com/docs/crate/clients-tools/en/latest/connect/python.html

If you are looking at async support for SQLAlchemy, on top of the PostgreSQL drivers enumerated above, that patch might bring in what you are looking for.

  • https://github.com/crate-workbench/sqlalchemy-cratedb/pull/11

In this case, please have a look at those examples, which can be used right away when following the corresponding dependency specifications.

  • https://github.com/crate/cratedb-examples/blob/main/by-language/python-sqlalchemy/async_table.py
  • https://github.com/crate/cratedb-examples/blob/main/by-language/python-sqlalchemy/async_streaming.py

With kind regards, Andreas.

amotl avatar May 27 '24 23:05 amotl

@amotl Just wanted to be able to use the HTTP bulk API with async, don't need async SQLAlchemy at the moment

It seems like the HTTP bulk API processes bulk inserts the most efficiently than using Postgres

robd003 avatar May 28 '24 02:05 robd003

The main purpose of this library is to implement the DBAPI. Given that there is no async version of it yet, I'm not sure what adding async capabilities into this library would bring on the table. It's not much effort to use a async http library directly - or one of the async pg driver variants.

E.g. with aiohttp (untested):

async with aiohttp.ClientSession as session:
    data = json.dumps({
        "stmt": "insert into ...",
        "bulk_args": [
            [...],
            [...],
        ]
    })
    async with session.post(server_url,
                            data=data,
                            headers={'Content-Type': 'application/json'}) as resp:
        result = await resp.json()

mfussenegger avatar May 28 '24 07:05 mfussenegger

As Mathias said this repo is for the DBAPI which will not have an async API anytime soon, since the python team doesn't consider it necessary.

You could try to use https://pypi.org/project/cratedb-async/ to see if it fits your use case. I've also noticed that HTTP bulk insert handles more data than pg async. Closing this since nothing can be done in this repo.

surister avatar Oct 06 '25 08:10 surister