python-gitlab
python-gitlab copied to clipboard

async support & httpx-based python-gitlab

Open vishes-shell opened this issue 4 years ago • 25 comments

I've completely reworked python-gitlab to be async-compatible: https://github.com/vishes-shell/python-gitlab :

code it fully async compatible via awesome httpx
async tests are rewritten with pytest, pytest-asyncio and respx
functional tests are also rewritten to be async

You can see changes in pull request https://github.com/vishes-shell/python-gitlab/pull/1 , i've tried to stick to project rules, but without pytest it would be harder...

I've seen some work that's done in https://github.com/python-gitlab/python-gitlab/pull/998 , and at the beginning i started on top of it, but found it hard to continue, due to some incomplete changes that are done, and trying to do sync stuff via run_until_complete would not work.

I would like to release it under python-gitlab-async module.

How i see it:

i would hear your thoughts on this
make some changes that you require (maybe drop cli support since python-gitlab do it good)
update documentation
transfer my fork under your python-gitlab namespace as python-gitlab-async, become a member of it
release it in pypi.org
take care of it

I think it's not that hard to keep up with changes from master branch and try to release as python-gitlab with same versions.

It was a pleasure to work with this brilliantly written library, and the same time you have a lot of defs to become async defs.:)

Feb 22 '20 12:02 vishes-shell

Why not try to have the changes merged into this project? :) The maintainer has already signaled in #998 that he was willing to merge httpx-based work, they were just waiting with the other contributor for a stable httpx 1.0 to be released in the next month or so.

I totally agree with migrating to pytest though. All existing unit tests can already be run via pytest (with 2 lines of code changed); it would make proper testcases possible in functional tests both for the API and CLI, and both could then reuse fixtures, run entire suites independently of individual failures etc.

Feb 22 '20 14:02 nejch

@nejch as far as I know it's really bad practice to implement every methods' sync functions through

loop = asyncio.get_event_loop()
loop.run_until_complete(async_foo())

So without sync methods that would break any sync projects that rely on this. Maybe I'm not aware of a good way to make async/sync compatible library.

Feb 22 '20 15:02 vishes-shell

Ah sorry, the async/sync mess :/ I'm not sure either. It seems like some people are trying though, if this helps with future discussion: https://stackoverflow.com/questions/55152952/duplication-of-code-for-synchronous-and-asynchronous-implementations

Feb 22 '20 22:02 nejch

@nejch yeah, as i was saying it's not that simple to being able to create async/sync library (thats why we have ton of sync and async library separate), as it can require rebuilding the way the library works, as my personal perspective the library is brilliantly written and does it's work, and does it well.

So i am for new library, that would be not that hard to maintain since we have async version, maybe sometime in the future we would never require sync version, and just continue develop async version, but until it's not happen, it's probably better to have two libraries.:)

Feb 22 '20 23:02 vishes-shell

@bufferoverflow @max-wittig i've seen that you liked the comment that is about maintaining only one library (the current one) and having async/sync interface. That's a so much different that i've suggested, but still achievable. The downsides as i see it:

doubles codebase (sync/async interfaces), there is not much that could stay simple def
will require additional library httpx, another way is to move completely httpx since it supports sync operations
would be harder to navigate in source code, since almost all of the functions will require it's async sibling

As i was saying, there is not much libraries which support sync and async interface, most of them have some sort of library clone, that operates only in async way. The one, that i am aware of is httpx, and the public interface is fully doubled, you can see https://github.com/encode/httpx/blob/master/httpx/_client.py .

Can i hear your thoughts on this? Thanks you!

Feb 24 '20 16:02 vishes-shell

Async creep is inevitable when code is too dependent on I/O calls. Using asyncio.run or loop.run_until_complete is not acceptable since it adds way too much unneeded overhead for purely sync consumers.

If we want to support both sync in async in one codebase and avoid problems mentioned by @vishes-shell, we'll probably have to refactor some of the higher-level code. I'd suggest decoupling serialization logic from transport code, providing a single interface for HTTP requests in concrete SyncGitlab and AsyncGitlab subclasses. Most of the job would be rewriting RESTObject subclasses to remove HTTP calls, perhaps by introducing some proxy LazyGitlabResponse object representing how HTTP response needs to be deserialized.

Feb 25 '20 09:02 khvn26

@vishes-shell I'm still unsure about the real benefits of async support within python-gitlab, could you elaborate a bit on why this is useful and required? How many projects, people do you have on your GitLab? What's the improvement in regard of performance when using async within your environment? How do you handle http error code 429 (too many requests) with the async lib?

Feb 25 '20 09:02 bufferoverflow

I think that something similar to what @khvn26 is suggesting for Sync/Async classes is done in Kenneth Reithz's requests-html (admittedly a much smaller wrapper library):

https://github.com/psf/requests-html/blob/master/requests_html.py#L787-L843

Feb 25 '20 10:02 nejch

@bufferoverflow i believe it crutial to have async library if you want to build an application based on async features, more and more applications begin to be developed with async feature in mind, thanks to async since Python 3.5, we can see the growth of interest in async web frameworks: Starlette, FastAPI, django channels.

Having async application and blocking gitlab http client would kill all features that are gained with async. I've seen some timeout errors in gitlab with 30 seconds, that's definitely an overkill. So basically if you want to build something async you want async in your I/O.

Projects, users? Thousands. Does it matter?

429 errors? There is functional tests in library which are perfectly passing. We just replace time.sleep(wait_time) with asyncio.sleep(wait_time).

Feb 25 '20 10:02 vishes-shell

@vishes-shell Have you found an example of other API wrapper libraries using async already? I haven't found any, but I can imagine how web frameworks benefit from asyncio. Not sure about this wrapper library though.

Feb 25 '20 10:02 max-wittig

@max-wittig

https://github.com/brettcannon/gidgethub
https://github.com/terrycain/aioboto3
https://github.com/Fahreeve/aiovk/tree/master/aiovk
https://github.com/csko/gdax-python-api
https://github.com/TwitchIO/TwitchIO
https://github.com/odrling/peony-twitter

I can search for more. More of the api wrapper libraries are still sync just because of they are built a long time ago and async is rather new.

Feb 25 '20 10:02 vishes-shell

@vishes-shell I guess it would make sense to use a separate library for this, but I don't see how I can maintain two libraries around python-gitlab instead of one, so feel free to publish python-gitlab-async but I may not be able to help much.

Additionally it would be nice, if you could backport your pytest changes to python-gitlab

Feb 25 '20 11:02 max-wittig

@max-wittig i've realized how much work it will require to maintain two libraries. I totally ok with supporting the async one, but to have two up-to-date versions of mostly same project would require x2 issues/MRs.

I will try to dig into different implementation that would require less changes for new features and keep everything under same project. Will keep you updated.

Feb 25 '20 20:02 vishes-shell

Edit: ah Vishes you just beat me to it a few seconds before I wrote this little novel lol, oops.

@bufferoverflow I think that async will probably happen whether it is solved here or not - there is already #370, #945, and a lot of activity in this issue in just a few days. I assume someone would eventually go and write an async library if this stalls :P And as was mentioned, async seems to be the go-to now for new libraries dealing mostly with disk or network I/O, just still quite fresh in the python community (heh, and we just got rid of the py2/py3 dilemma).

@vishes-shell to explain my motivation for my initial response, as someone who has to maintain software delivery at work for different versions of the same application, I would personally avoid the overhead of several libraries if possible - so I was naturally biased against it I guess ;) But if the effort to do async/sync in the same library is even higher, then that makes sense of course (anyway, I only started contributing last week here so this is all just my opinion :P).

So in that case I'd just take into account that there will be some more pedestrian challenges that would be good to agree on with the maintainers here to avoid libraries diverging, like managing issues (e.g. would you have write access and move issues, or issues disabled on one repo, and have an async label on the other), managing the inevitable PR's against the async library that would better fit upstream, syncing on integrations with CI systems (since it often involves accounts and credentials), separate release lifecycles, the fact that you now have 1 maintainer each in 2 libraries that might lose interest when they change career roles etc.

I still think it'd be nice to have a long-term roadmap to eventually merge and deprecate sync, if for no other reason than as motivation to stop libraries diverging too much. I'm looking at some modern libraries, and they do offer sync/async together (e.g. all services in Azure SDK), so it really seems that legacy code with too much inertia is often the reason for a separate library.

Feb 25 '20 20:02 nejch

@nejch @max-wittig @bufferoverflow @khvn26 i've succeeded in combining async and sync interfaces: #1036

I've tried to make as less changes as possible and don't brake initial design solution.

My next steps are:

Update docstrings for gitlab clients, since i made them abstractmethod without copying the whole docstring (would implement some sort of decorator that picks docstrings from target)
Add more comments on new code
Add functional tests for async client in tox
Pull changes from master and resolve conflicts

And i'd like to hear what i should also do to make this wrapper even more awesome.

Mar 04 '20 10:03 vishes-shell

Wow that was quick! :) Did you end up getting inspiration from some existing projects?

I'll just follow how you guys do this here, as I wanted to rework functional tests before making new contributions, so I'll take this into account (e.g. your async functional tests are currently mostly a copy of sync. I was going to rework the series of asserts from the sync version into testcases/modules.. but it probably wouldn't make sense to go change that in this scope as it's already a very big PR).

Mar 04 '20 23:03 nejch

@nejch that was some sort of challenge that bugged me, that's why it was quick.

About inspiration, i only got it from httpx, the client implementation: _client.py, with base class and two derived ones: sync and async.

I could with tests if you want so, i can try to help you if you want. Functional tests are not that good, because when i was testing them, the fastest way to restart them for me was deleting objects that wasn't deleted (groups, users, projects).

Mar 05 '20 08:03 vishes-shell

@nejch i have a question about functional tests, do you have ideas how to omit copy and paste? Because we want to test async and sync interface, thats for sure, but in async tests we need to await for result.

Mar 05 '20 10:03 vishes-shell

List of notable changes: https://github.com/python-gitlab/python-gitlab/pull/1036#issuecomment-595308609

Mar 05 '20 16:03 vishes-shell

@vishes-shell Sorry for the late reply! I don't actually know if they can be deduplicated nicely, but I meant more to have them split into a structure where maintaining them in parallel is easier on the brain ;) E.g. by splitting them into per-object test modules in a similar way as suggested for the objects themselves in #694 and drafted in #997.

I guess the test functions have to be duplicated, but with this structure maybe at least the payloads that are used to send to the server for resource creation can be reused several times. Not sure if that makes it too complicated? That's why I thought this topic could wait for a separate PR.

Currently there is a lot of repetitive setup e.g. in the blocks below:

https://github.com/python-gitlab/python-gitlab/blob/e5afb554bf4bcc28555bde4030f50558f175a53b/tools/python_test_v4.py#L235-L252 https://github.com/python-gitlab/python-gitlab/blob/e5afb554bf4bcc28555bde4030f50558f175a53b/tools/python_test_v4.py#L99-L134

Mar 08 '20 08:03 nejch

@nejch It does make sense to have the payload as separate fixtures and use the sync, async tests as wrapper, though it would require changes in three files for a new feature-test.

Mar 08 '20 10:03 max-wittig

httpx (httpcore really) itself uses unasync for single source async/sync support: https://github.com/encode/httpcore/blob/c8d93d07369dab11d5c03f11de25607be1d21faf/unasync.py

Jun 21 '21 12:06 graingert

httpx (httpcore really) itself uses unasync for single source async/sync support: https://github.com/encode/httpcore/blob/c8d93d07369dab11d5c03f11de25607be1d21faf/unasync.py

@graingert cool! That seems better to avoid duplication. The httpcore implementation doesn't make it obvious but there is a package for this that plugs into the build process and doesn't need the duplicated sync code committed: https://unasync.readthedocs.io/en/latest/ (edit: oh, I guess you're probably involved in that project somehow :grin: )

I just randomly found it here: https://github.com/johtso/httpx-caching/blob/master/gen_sync.py

Looks like doing some reading up on Sans IO would also help here.. https://sans-io.readthedocs.io/

Jun 26 '21 17:06 nejch

Any progress here?

Btw: There is already a separate library released: https://github.com/pan-net-security/aio-gitlab

Dec 28 '21 12:12 CarliJoy

Any updates would love to use async client @vishes-shell

May 31 '23 13:05 yairsimantov20

python-gitlab python-gitlab copied to clipboard

async support & httpx-based python-gitlab

python-gitlab
python-gitlab copied to clipboard