python-gitlab
python-gitlab copied to clipboard
async support & httpx-based python-gitlab
I've completely reworked python-gitlab to be async-compatible: https://github.com/vishes-shell/python-gitlab :
- code it fully async compatible via awesome
httpx
- async tests are rewritten with
pytest
,pytest-asyncio
andrespx
- functional tests are also rewritten to be async
You can see changes in pull request https://github.com/vishes-shell/python-gitlab/pull/1 , i've tried to stick to project rules, but without pytest
it would be harder...
I've seen some work that's done in https://github.com/python-gitlab/python-gitlab/pull/998 , and at the beginning i started on top of it, but found it hard to continue, due to some incomplete changes that are done, and trying to do sync stuff via run_until_complete
would not work.
I would like to release it under python-gitlab-async
module.
How i see it:
- i would hear your thoughts on this
- make some changes that you require (maybe drop
cli
support sincepython-gitlab
do it good) - update documentation
- transfer my fork under your
python-gitlab
namespace aspython-gitlab-async
, become a member of it - release it in pypi.org
- take care of it
I think it's not that hard to keep up with changes from master
branch and try to release as python-gitlab
with same versions.
It was a pleasure to work with this brilliantly written library, and the same time you have a lot of def
s to become async def
s.:)
Why not try to have the changes merged into this project? :) The maintainer has already signaled in #998 that he was willing to merge httpx-based work, they were just waiting with the other contributor for a stable httpx 1.0 to be released in the next month or so.
I totally agree with migrating to pytest though. All existing unit tests can already be run via pytest (with 2 lines of code changed); it would make proper testcases possible in functional tests both for the API and CLI, and both could then reuse fixtures, run entire suites independently of individual failures etc.
@nejch as far as I know it's really bad practice to implement every methods' sync functions through
loop = asyncio.get_event_loop()
loop.run_until_complete(async_foo())
So without sync methods that would break any sync projects that rely on this. Maybe I'm not aware of a good way to make async/sync compatible library.
Ah sorry, the async/sync mess :/ I'm not sure either. It seems like some people are trying though, if this helps with future discussion: https://stackoverflow.com/questions/55152952/duplication-of-code-for-synchronous-and-asynchronous-implementations
@nejch yeah, as i was saying it's not that simple to being able to create async/sync library (thats why we have ton of sync and async library separate), as it can require rebuilding the way the library works, as my personal perspective the library is brilliantly written and does it's work, and does it well.
So i am for new library, that would be not that hard to maintain since we have async version, maybe sometime in the future we would never require sync version, and just continue develop async version, but until it's not happen, it's probably better to have two libraries.:)
@bufferoverflow @max-wittig i've seen that you liked the comment that is about maintaining only one library (the current one) and having async/sync interface. That's a so much different that i've suggested, but still achievable. The downsides as i see it:
- doubles codebase (sync/async interfaces), there is not much that could stay simple
def
- will require additional library
httpx
, another way is to move completelyhttpx
since it supports sync operations - would be harder to navigate in source code, since almost all of the functions will require it's
async
sibling
As i was saying, there is not much libraries which support sync and async interface, most of them have some sort of library clone, that operates only in async
way. The one, that i am aware of is httpx
, and the public interface is fully doubled, you can see https://github.com/encode/httpx/blob/master/httpx/_client.py .
Can i hear your thoughts on this? Thanks you!
Async creep is inevitable when code is too dependent on I/O calls. Using asyncio.run
or loop.run_until_complete
is not acceptable since it adds way too much unneeded overhead for purely sync consumers.
If we want to support both sync in async in one codebase and avoid problems mentioned by @vishes-shell, we'll probably have to refactor some of the higher-level code.
I'd suggest decoupling serialization logic from transport code, providing a single interface for HTTP requests in concrete SyncGitlab
and AsyncGitlab
subclasses. Most of the job would be rewriting RESTObject
subclasses to remove HTTP calls, perhaps by introducing some proxy LazyGitlabResponse
object representing how HTTP response needs to be deserialized.
@vishes-shell I'm still unsure about the real benefits of async support within python-gitlab, could you elaborate a bit on why this is useful and required? How many projects, people do you have on your GitLab? What's the improvement in regard of performance when using async within your environment? How do you handle http error code 429 (too many requests) with the async lib?
I think that something similar to what @khvn26 is suggesting for Sync/Async classes is done in Kenneth Reithz's requests-html (admittedly a much smaller wrapper library):
https://github.com/psf/requests-html/blob/master/requests_html.py#L787-L843
@bufferoverflow i believe it crutial to have async library if you want to build an application based on async features, more and more applications begin to be developed with async feature in mind, thanks to async
since Python 3.5, we can see the growth of interest in async web frameworks: Starlette, FastAPI, django channels.
Having async application and blocking gitlab http client would kill all features that are gained with async. I've seen some timeout errors in gitlab with 30 seconds, that's definitely an overkill. So basically if you want to build something async you want async in your I/O.
Projects, users? Thousands. Does it matter?
429 errors? There is functional tests in library which are perfectly passing. We just replace time.sleep(wait_time)
with asyncio.sleep(wait_time)
.
@vishes-shell Have you found an example of other API wrapper libraries using async already? I haven't found any, but I can imagine how web frameworks benefit from asyncio. Not sure about this wrapper library though.
@max-wittig
- https://github.com/brettcannon/gidgethub
- https://github.com/terrycain/aioboto3
- https://github.com/Fahreeve/aiovk/tree/master/aiovk
- https://github.com/csko/gdax-python-api
- https://github.com/TwitchIO/TwitchIO
- https://github.com/odrling/peony-twitter
I can search for more. More of the api wrapper libraries are still sync just because of they are built a long time ago and async
is rather new.
@vishes-shell I guess it would make sense to use a separate library for this, but I don't see how I can maintain two libraries around python-gitlab instead of one, so feel free to publish python-gitlab-async but I may not be able to help much.
Additionally it would be nice, if you could backport
your pytest changes to python-gitlab
@max-wittig i've realized how much work it will require to maintain two libraries. I totally ok with supporting the async one, but to have two up-to-date versions of mostly same project would require x2 issues/MRs.
I will try to dig into different implementation that would require less changes for new features and keep everything under same project. Will keep you updated.
Edit: ah Vishes you just beat me to it a few seconds before I wrote this little novel lol, oops.
@bufferoverflow I think that async will probably happen whether it is solved here or not - there is already #370, #945, and a lot of activity in this issue in just a few days. I assume someone would eventually go and write an async library if this stalls :P And as was mentioned, async seems to be the go-to now for new libraries dealing mostly with disk or network I/O, just still quite fresh in the python community (heh, and we just got rid of the py2/py3 dilemma).
@vishes-shell to explain my motivation for my initial response, as someone who has to maintain software delivery at work for different versions of the same application, I would personally avoid the overhead of several libraries if possible - so I was naturally biased against it I guess ;) But if the effort to do async/sync in the same library is even higher, then that makes sense of course (anyway, I only started contributing last week here so this is all just my opinion :P).
So in that case I'd just take into account that there will be some more pedestrian challenges that would be good to agree on with the maintainers here to avoid libraries diverging, like managing issues (e.g. would you have write access and move issues, or issues disabled on one repo, and have an async label on the other), managing the inevitable PR's against the async library that would better fit upstream, syncing on integrations with CI systems (since it often involves accounts and credentials), separate release lifecycles, the fact that you now have 1 maintainer each in 2 libraries that might lose interest when they change career roles etc.
I still think it'd be nice to have a long-term roadmap to eventually merge and deprecate sync, if for no other reason than as motivation to stop libraries diverging too much. I'm looking at some modern libraries, and they do offer sync/async together (e.g. all services in Azure SDK), so it really seems that legacy code with too much inertia is often the reason for a separate library.
@nejch @max-wittig @bufferoverflow @khvn26 i've succeeded in combining async and sync interfaces: #1036
I've tried to make as less changes as possible and don't brake initial design solution.
My next steps are:
- Update docstrings for gitlab clients, since i made them abstractmethod without copying the whole docstring (would implement some sort of decorator that picks docstrings from target)
- Add more comments on new code
- Add functional tests for async client in tox
- Pull changes from master and resolve conflicts
And i'd like to hear what i should also do to make this wrapper even more awesome.
Wow that was quick! :) Did you end up getting inspiration from some existing projects?
I'll just follow how you guys do this here, as I wanted to rework functional tests before making new contributions, so I'll take this into account (e.g. your async functional tests are currently mostly a copy of sync. I was going to rework the series of asserts from the sync version into testcases/modules.. but it probably wouldn't make sense to go change that in this scope as it's already a very big PR).
@nejch that was some sort of challenge that bugged me, that's why it was quick.
About inspiration, i only got it from httpx
, the client implementation: _client.py
, with base class and two derived ones: sync and async.
I could with tests if you want so, i can try to help you if you want. Functional tests are not that good, because when i was testing them, the fastest way to restart them for me was deleting objects that wasn't deleted (groups, users, projects).
@nejch i have a question about functional tests, do you have ideas how to omit copy and paste? Because we want to test async and sync interface, thats for sure, but in async tests we need to await
for result.
List of notable changes: https://github.com/python-gitlab/python-gitlab/pull/1036#issuecomment-595308609
@vishes-shell Sorry for the late reply! I don't actually know if they can be deduplicated nicely, but I meant more to have them split into a structure where maintaining them in parallel is easier on the brain ;) E.g. by splitting them into per-object test modules in a similar way as suggested for the objects themselves in #694 and drafted in #997.
I guess the test functions have to be duplicated, but with this structure maybe at least the payloads that are used to send to the server for resource creation can be reused several times. Not sure if that makes it too complicated? That's why I thought this topic could wait for a separate PR.
Currently there is a lot of repetitive setup e.g. in the blocks below:
https://github.com/python-gitlab/python-gitlab/blob/e5afb554bf4bcc28555bde4030f50558f175a53b/tools/python_test_v4.py#L235-L252 https://github.com/python-gitlab/python-gitlab/blob/e5afb554bf4bcc28555bde4030f50558f175a53b/tools/python_test_v4.py#L99-L134
@nejch It does make sense to have the payload as separate fixtures and use the sync, async tests as wrapper, though it would require changes in three files for a new feature-test.
httpx (httpcore really) itself uses unasync for single source async/sync support: https://github.com/encode/httpcore/blob/c8d93d07369dab11d5c03f11de25607be1d21faf/unasync.py
httpx (httpcore really) itself uses unasync for single source async/sync support: https://github.com/encode/httpcore/blob/c8d93d07369dab11d5c03f11de25607be1d21faf/unasync.py
@graingert cool! That seems better to avoid duplication. The httpcore implementation doesn't make it obvious but there is a package for this that plugs into the build process and doesn't need the duplicated sync code committed: https://unasync.readthedocs.io/en/latest/ (edit: oh, I guess you're probably involved in that project somehow :grin: )
I just randomly found it here: https://github.com/johtso/httpx-caching/blob/master/gen_sync.py
Looks like doing some reading up on Sans IO would also help here.. https://sans-io.readthedocs.io/
Any progress here?
Btw: There is already a separate library released: https://github.com/pan-net-security/aio-gitlab
Any updates would love to use async client @vishes-shell