python-arango icon indicating copy to clipboard operation
python-arango copied to clipboard

asyncio support

Open wshayes opened this issue 5 years ago • 28 comments

Hi Joohwan,

We really like your arango library. I am curious what your plans are in regards to asyncio support. I saw you started another repo for it. I'd be interested in helping out where I could to move this forward.

Thanks!

wshayes avatar Feb 24 '19 00:02 wshayes

Hi @wshayes,

Thank you for liking python-arango. I am glad to hear that people are finding it useful.

Unfortunately I am too busy right now to start a new project and my knowledge on asyncio (and async programming in general) is novice at best. I'm not sure when I will have time to invest in it. I will keep this issue open for any updates in the future.

Best, Joohwan

joowani avatar Feb 24 '19 01:02 joowani

I was about to put this request in when I found it was already in.

I thought about it a bit and here is what I know.

Most of the API would not have to change, in particular, you wouldn't have to make every def an async def. That is because you can make it so that in "async mode" the ordinary functions return a coroutine that returns the result instead of returning a result. Somebody can await that result in async code and it will be all good.

The first step is making an HttpClient that uses aiohttp instead of requests; that should be easy, and the key is that we return a coroutine that returns the result.

Another thing that needs to change are the Executors; I think we need to build a parallel set of executors for the four different implementations of executor.

Then there is the plumbing to make sure that when you using the async HttpClient you also get the async executors. Then I think we would be in good shape.

The main annoyance I see is that the async/await syntax is Python 3.5+ only. I think the clean solution is make an interface for the sync/asyncio implementations and then have a second package which is Py3.5 that contains the asyncio implementation.

I might work up the motivation to make a pull request.

paulhoule avatar Apr 10 '19 01:04 paulhoule

This will also be a very useful feature for me. @joowani, is this still something you plan to invest time in?

mooncake4132 avatar Sep 06 '19 11:09 mooncake4132

Hi!

I was on the need of an async driver for arangodb so I took yours @joowani and did a fork on async style! Feel free to comment it.

https://github.com/bloodbare/aioarangodb

Thanks for your amazing work, its been so easy to adapt to async world

bloodbare avatar Jun 11 '20 10:06 bloodbare

Hi @bloodbare,

Wow this is amazing! I'll make sure to mention your project in python-arango readme. You should also send an email to ArangoDB team so they can have your driver on their website. Awesome work.

joowani avatar Jun 11 '20 20:06 joowani

@bloodbare's package looks amazing!

Is there still a plan to support async io in the formal package? Seems like this repo keeps updating every few months while the asyncio's last commit is from 1y ago and less popular.

rennenc avatar May 20 '21 08:05 rennenc

Hi!

I was also looking for the asynchronous version and noticed that the version from @bloodbare had not been updated for over a year, so I made a fork (not a copy) and made it fully asynchronous. The current version (1.0.0) fully complient to python-arango 7.2.0.

https://github.com/mirrorrim/aioarango

@rennenc please take a look, I'll try to keep it up to date :)

mirrorrim avatar Jul 05 '21 13:07 mirrorrim

Thanks. I still think it would be great to have to adopted and maintained as part of the same repo. Nevertheless, will take a look.

rennenc avatar Jul 05 '21 13:07 rennenc

Hey I have no problem to keep it on my repo, I've not been updating because we did not had the need, PRs are more than welcome.

R

Missatge de Rennen @.***> del dia dl., 5 de jul. 2021 a les 15:18:

Thanks. I still think it would be great to have to adopted and maintained as part of the same repo. Nevertheless, will take a look.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joowani/python-arango/issues/95#issuecomment-874107768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADW4AKUW6UCNE4JEAJVJMTTWGWMDANCNFSM4GZY6DFA .

-- Ramon a.k.a bloodbare

bloodbare avatar Jul 05 '21 15:07 bloodbare

Sorry, I didn't see any activity in your repository: 4 open issues and 1 pull request :(

mirrorrim avatar Jul 05 '21 15:07 mirrorrim

I see .. the PR notification where lost. I'll update it thanks for the ping!

Missatge de mirrorrim @.***> del dia dl., 5 de jul. 2021 a les 17:09:

Sorry, I didn't see any activity in your repository: 4 open issues and 1 pull request :(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joowani/python-arango/issues/95#issuecomment-874182590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADW4APGHJBXR3ESJ6Q3MS3TWHDL5ANCNFSM4GZY6DFA .

-- Ramon a.k.a bloodbare

bloodbare avatar Jul 05 '21 15:07 bloodbare

First thanks @joowani for all the efforts on the python-arango library it's really a well-written project.

I was looking for an async solution and judging from the comments it seems like having an async version of this library will be beneficial. I see @mirrorrim and @bloodbare has already done some work to make this possible.

@mirrorrim or @bloodbare have you reached out or would you consider making either of your repositories part of the ArangoDB Community?

I think it would be cool to have this part of an official community package just to get more eyes on it and would be happy to contribute in terms of keeping it up to date with the python-arango package. Also would like to avoid creating another fork if possible.

alexvanzyl avatar Mar 24 '22 10:03 alexvanzyl

It would be great if @mirrorrim's work could be merged back into the main project as an optional API or it was somehow made an official community package as @alexvanzyl suggested.

incorvia avatar Jun 14 '22 00:06 incorvia

I agree with @incorvia, it would be really nice if we could get the async versions merged into the official community package and maintained as part of it in the future. I also want to highlight that at the moment this issue is closed and I think based on the discussion here it would be worth considering to reopen it and try to get forward with incorporating the work into the main community package, or what do you think?

joakimnordling avatar Jun 20 '22 11:06 joakimnordling

We would gladly welcome the aioarango package as a part of the arangodb-community org! I think merging it as part of the python-arango package might be beneficial but would need @joowani and @aMahanna to weigh in on that.

If you want to bring the project over please ping me on Slack to discuss the details: https://join.slack.com/t/arangodb-community/shared_invite/zt-1b66mygms-j8TmOdXE7FojR5yA2Yg8kg (Chris.ArangoDB)

cw00dw0rd avatar Jun 20 '22 19:06 cw00dw0rd

I'm open to merging into python-arango, but it will nearly double the codebase (and double the effort to maintain). I'm not sure how much code reuse there could be between sync and async yet. It will probably require a non-trivial amount of work to refactor.

joowani avatar Jun 20 '22 19:06 joowani

This seems to be the main commit that was written to convert python-arango to use async.. perhaps a PR could be opened that uses this as a starting point. It doesn't look like it would double the code base even if async was conditionalized..

@mirrorrim any suggestions here since you did the work?

incorvia avatar Jun 20 '22 22:06 incorvia

In case you want some ideas for how to easily maintain the codebase with both the sync and async version, here's what we did for firedantic (shares many ideas with arangodantic that we also created, but for which we don't have a sync version).

  1. We created an async version of the library (this commit) in addition to the sync version.
  2. A colleague set up a tool that from the async version automatically generates the sync version of the code (this commit) The main things here is the unasync.py script that generates the sync version from the async one and the pre-commit hook that ensures the sync version is generated before committing.
  3. Now any further work is just a matter of updating the async version and ensure the sync version (which is generated automatically) works (mainly reviewing the generated code and run the tests) and possibly do minor adjustments to the unasync.py script.

I should highlight that the idea is from https://github.com/python-trio/unasync and the unasync.py we use is a modified version of https://github.com/encode/httpcore/blob/master/unasync.py.

A suggestion for how to incorporate this same into python-arango would be to restructure the code a into directories for the sync/async versions, then ensure the async version is up to date and then create a modified version of the unasync.py so it's able to generate the sync version from the async one. Some special treatment likely needed for the client (httpx/requests), but that should be possible to do as well. Then after that, the main work for maintaining the library should be just a matter of updating the async version and ensure there's no issues with the generated sync version, so not a lot more work than maintaining just one code base.

There might be better approaches (if you know any, please let me know as well), but this has served us really well for firedantic.

joakimnordling avatar Jun 21 '22 07:06 joakimnordling

What would be nice is to call a async version of db from the regular client instance. The aioarango library changed every method to async which might broadens the scope too much. It would probably be better to start with a limited set of functions that benefit the most of async operations such as reads and insertions.

davidschrooten avatar Jun 30 '22 09:06 davidschrooten

@joakimnordling unasync.py looks interesting but feels a little hacky. I'm leaning towards @davidschrooten's suggestion a little more currently (which will allow us to take advantage of @mirrorrim and @bloodbare's work via copy paste essentially), but will explore both.

joowani avatar Jun 30 '22 16:06 joowani

I'm open to merging into python-arango, but it will nearly double the codebase (and double the effort to maintain). I'm not sure how much code reuse there could be between sync and async yet. It will probably require a non-trivial amount of work to refactor.

Why not keep the async definitions truly async?

slava-shor-emporus avatar Jan 31 '23 21:01 slava-shor-emporus

Today I am using

https://aioarango.readthedocs.io/en/latest/

I also see there is

https://aioarangodb.readthedocs.io/en/latest/

which I have not done a real comparison so I don't feel a lot of reason to push for changes.

My guess is that you could, however, use code generation to make both sync and async stubs for the all the functions in the library and have a sync-async library without undue duplication in a library.

------ Original Message ------ From "Slava Shor" @.> To "ArangoDB-Community/python-arango" @.> Cc "Paul Houle" @.>; "Comment" @.> Date 1/31/2023 4:20:21 PM Subject Re: [ArangoDB-Community/python-arango] asyncio support (#95)

I'm open to merging into python-arango, but it will nearly double the codebase (and double the effort to maintain). I'm not sure how much code reuse there could be between sync and async yet. It will probably require a non-trivial amount of work to refactor.

Why not keep the async definitions truly async?

— Reply to this email directly, view it on GitHub https://github.com/ArangoDB-Community/python-arango/issues/95#issuecomment-1411085005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUUCINV2VZ7ZRDFPKQQRGDWVF6ZLANCNFSM4GZY6DFA. You are receiving this because you commented.Message ID: @.***>

paulhoule avatar Jan 31 '23 22:01 paulhoule

Today I am using

https://aioarango.readthedocs.io/en/latest/

I also see there is

https://aioarangodb.readthedocs.io/en/latest/

which I have not done an actual comparison so I don't feel a lot of reason to push for changes.

As of today, neither libraries are actively developed nor maintained. It's better to keep synchronous and async libraries separately, as it's doubtful that both versions will be used in the same project. As well one may find that in other DB drivers blocking and async versions are being developed separately.

slava-shor-emporus avatar Feb 05 '23 09:02 slava-shor-emporus

If I run into some problem with aioarango I'll consider patching it.

Very big API wrappers like boto3 use code generation to build sync and async apis from the same source. API wrapper code is highly structured and much of the complexity is embodied in functions that marshall and unmarshall arguments. Those functions are nice synchronous functions that can be used in either environment.

Personally I code almost everything adb-related async even if it doesn't need it just to keep the async API on my fingertips and for the possibility I might want to move a bit or two of code into the async world. Having almost the same API would make it be convenient for a programmer to switch and with one code base the problem of maintaining the sharable functions on both sides is solved.

------ Original Message ------ From "Slava Shor" @.> To "ArangoDB-Community/python-arango" @.> Cc "Paul Houle" @.>; "Comment" @.> Date 2/5/2023 4:20:17 AM Subject Re: [ArangoDB-Community/python-arango] asyncio support (#95)

Today I am using

https://aioarango.readthedocs.io/en/latest/

I also see there is

https://aioarangodb.readthedocs.io/en/latest/

which I have not done an actual comparison so I don't feel a lot of reason to push for changes.

As of today, neither libraries are actively developed nor maintained. It's better to keep synchronous and async libraries separately, as it's doubtful that both versions will be used in the same project. As well one may find that in other DB drivers blocking and async versions are being developed separately.

— Reply to this email directly, view it on GitHub https://github.com/ArangoDB-Community/python-arango/issues/95#issuecomment-1417186589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUUCIOE7YTXIMCZNCLPVOLWV5WFDANCNFSM4GZY6DFA. You are receiving this because you commented.Message ID: @.***>

paulhoule avatar Feb 10 '23 03:02 paulhoule

I started a PR at Arangodantic. Maybe you want to have a look at it. I have chosen (Asyncer)[https://asyncer.tiangolo.com/] as AsyncLib and python-arango as client. With Asyncer i can call sync code from async etc. This Package is made by tiangolo. The Author from FastApi.

I would be very happy about comments and remarks

Arangodantic V2 PR

dasTholo avatar Sep 20 '23 10:09 dasTholo

Hi @dasTholo,

It's great to see proactive steps like yours, especially on such an important and highly requested feature. We recognize the increasing need for asyncio support and are actively pushing to add it to our roadmap. I like your asyncify approach, it seems to simplify the implementation and avoid code duplication.

apetenchea avatar Sep 20 '23 11:09 apetenchea

Welll... I have created a simple class, to help with this....

import asyncio
from concurrent.futures import ThreadPoolExecutor
from arango import ArangoClient


class ArangoDBRepository:
    def __init__(self, database_url, database_name, username, password):
        self._executor = ThreadPoolExecutor()
        self._client = ArangoClient(hosts=database_url)
        self._db = self._client.db(database_name, username=username, password=password)

    async def run_in_executor(self, func, *args):
        loop = asyncio.get_event_loop()
        try:
            result = await loop.run_in_executor(self._executor, func, *args)
            return result
        except Exception as e:
            print(f"An error occurred: {e}")
            raise

    async def has_collection(self, collection_name):
        return await self.run_in_executor(lambda: self._db.has_collection(collection_name))

    async def create_collection(self, collection_name, edge=False):
        return await self.run_in_executor(lambda: self._db.create_collection(collection_name, edge=edge))

    async def delete_collection(self, collection_name):
        return await self.run_in_executor(lambda: self._db.delete_collection(collection_name))

    async def create_persistent_index(self,
                                      collection_name,
                                      index_name,
                                      index_fields,
                                      cache_enabled):
        return await self.run_in_executor(
            lambda: self._db.collection(collection_name).add_persistent_index(name=index_name,
                                                                              fields=index_fields,
                                                                              cacheEnabled=cache_enabled))

    async def execute_aql_query(self, query, bind_vars=None):
        def aql_query():
            cursor = self._db.aql.execute(query, bind_vars=bind_vars)
            return [doc for doc in cursor]

        return await self.run_in_executor(aql_query)

    async def insert(self, collection_name,
                     document,
                     overwrite,
                     silent,
                     return_new):
        def insert():
            return self._db.collection(collection_name).insert(document,
                                                               overwrite=overwrite,
                                                               silent=silent,
                                                               return_new=return_new)

        return await self.run_in_executor(insert)

    async def insert_many(self,
                          collection_name,
                          documents,
                          overwrite,
                          silent):
        def insert_many():
            return self._db.collection(collection_name).insert_many(documents, overwrite=overwrite, silent=silent)

        return await self.run_in_executor(insert_many)

    async def batch_find_by_key(self,
                                collection_name,
                                key):
        return await self.run_in_executor(lambda: self._db.collection(collection_name).find({'_key': key}).batch())

bencz avatar Mar 08 '24 19:03 bencz

Hey @bencz, I like your class. The ThreadPoolExecutor approach is great for ensuring some asynchronous functionality, while keeping the rest of the codebase flexible.

I would like to point out that we're about to start working on an asynchronous driver: python-arango-async. The time allocated for this is limited, but yes, it's finally happening. I can't give a clear estimation of when it would be ready, but I expect about 3 months. We plan to release it gradually, starting with a minimal release once we've covered the basic functionality, and then adding more incrementally. The new driver is going to follow the same python-arango interface for most classes.

In the meantime, workarounds such as yours are a great way to "asynchronize" already existing code. Thanks for providing that example!

apetenchea avatar Mar 10 '24 11:03 apetenchea