webknossos-libs
webknossos-libs copied to clipboard
[client] Use single httpx client for requests & parallelize them
Currently, the generated client uses a fresh httpx client for every request (see also issue https://github.com/openapi-generators/openapi-python-client/issues/202 about this), and the resumable-client currently also for one session (but accepts a client in the constructor). This client should also add a global response event hook (see the raise_on_4xx_5xx example) for failed requests. Additionally, the api should use parallel requests where possible e.g. via async-routines.
I vote for this feature since it will make a significant performance improvement on multiple calls. Any plan to implement it?
@motybz Sorry for the late reply, this is definitely on our radar! Atm we're focusing on a stable API, but we'll tackle this afterwards to improve up- & download speed.
@jstriebel what's the status on this issue? Any updates in the past nine months?
@Olian04 So far there was little progress, as this is mostly blocked by https://github.com/openapi-generators/openapi-python-client/issues/202.
The following points can be tackled independently already, we'll try to tackle those soonish:
- [x] Register a global httpx Client
- [x] Use this client where httpx is used directly:
- dataset upload
- task creation
- annotation upload
- ~[ ] Add useful middle ware, such as good error handling and token-reloading~ We are using raise_for_status now
- [ ] Use async methods to schedule requests in parallel where possible (mostly dataset download)
- [ ] Test if fsspec can use an httpx client instead of aiohttp (for zarr-streaming)
Hey @jstriebel, I'm just going over mentions of the openapi-python-client issue to find folks to review https://github.com/openapi-generators/openapi-python-client/pull/775 before it merges (since it's a breaking change). I'd love your input!
A short update: @fm3 pushed 00e168323a0ac7d47d17edf209ac0d8a340f3b34 last week which removes the auto-generated client. As a byproduct, allocating a fresh httpx client for each api call is not done, anymore. Parallelizing the download of a dataset is still an open point.
The webknossos package now uses zarrita under the hood which uses async io to fetch chunks concurrently.