influxdb-python icon indicating copy to clipboard operation
influxdb-python copied to clipboard

Something about async support

Open Zheaoli opened this issue 7 years ago • 17 comments

May I ask ,is there some plan to support for async/await in Python3.5+?

Zheaoli avatar May 18 '17 09:05 Zheaoli

You can use threading module for async or multiprocessing for processes and making it work in the background..

renderit avatar May 20 '17 02:05 renderit

@renderit Yeah, I have a try , and I get a idea that to replace requests lib by aiohttp to request data. Call it aio-influxdb is better ?

Zheaoli avatar May 20 '17 09:05 Zheaoli

@Zheaoli Although it is quite basic and not very well tested yet, I have just made a simple async InfluxDB client and made it public: https://github.com/plugaai/aioinflux

gusutabopb avatar Jun 08 '17 08:06 gusutabopb

Can I ask what benefit you would get from adding asyncio to the influxdb-python side of house? Every request that comes into the HTTP or UDP endpoints on the influxdb side is handled concurrently in a separate goroutine, be that a read query or a write. Are you finding bottlenecks at the influxdb-python layer for sending data through to the backend? Are you having trouble pulling information back from the backend that you would need this done asynchronously?

Duplicate of #406

sebito91 avatar Jun 17 '17 21:06 sebito91

@sebito91 Since we have to interact with InfluxDB via HTTP, I don't really think how InfluxDB handles things internally is very relevant here. The benefit we would be basically the same as the benefit anyone will have from having an HTTP wrapper library use aiohttp instead of requests.

Using a sync/blocking approach (e.g. using the requests library), the Python interpreter is blocked from doing any other work until the response of the HTTP request is returned. The traditional workaround for that is using threads, but async programming is a lot more scalable and resource efficient. If your application already depends on async programming, then you have to make it async the whole way.

I am running a small application that is sending async HTTP requests to many of endpoints, asynchronously receiving data through a couple of async websocket connections and periodically dumping that data (also asynchronously) to multiple databases. Having a sync/blocking component in the that process can become a significant bottleneck.

gusutabopb avatar Jun 19 '17 07:06 gusutabopb

@gusutabopb thanks for the feedback, that's definitely good information to have. The library itself does need a bit of a rework + modernization, that's true...we'll have to get this prioritized. I'm always curious to hear what other users' workflow looks like, especially where like requests become a bottleneck for collection. You have to remember, we're "competing" against telegraf which is concurrent and applies batching...we don't have the concurrency at the moment, but we do have batching!

  1. What is your collection/write interval? Is it every 10s, 1s, subsecond?
  2. Are you sending metrics over the WAN, or is this a local network?
  3. Are you seeing drops in measurements as a result of request duration?
  4. Are you performing all writes, or a mix of writes/queries on the same client instance?

Sorry for all of the questions, but it'll help us drill to the right answer :D

sebito91 avatar Jun 20 '17 03:06 sebito91

For any modern application that has to operate at scale, async is a must. I'll sketch out our own usage to illustrate.

We run a web service that handles O(10,000) requests per second (across 4 servers) and another that handles millions of reqs/sec (across hundreds of servers). Both of these services are very latency sensitive. (N.B., the service at millions-per-second-scale is not written in Python.)

We simply want our servers to write metrics to influxdb once per minute. But if the call to influxdb blocks (either on i/o, or--in the case of coroutines--by not yielding nicely), then all requests in flight in a particular server process at that instant are also blocked. And if that influxdb write takes more than a millisecond or two (ha! in practice it frequently takes seconds), the result is unacceptable performance characteristics, once per minute per process.

Hope that helps explain the motivation for considering async.

RonRothman avatar Jun 22 '17 01:06 RonRothman

@sebito91 For context, since about 2 weeks ago I am not using influxdb-python at all, and am using my own async wrapper instead: https://github.com/plugaai/aioinflux. Documentation is still not complete, but it basically implements all the influxdb-python functionality in both an async AND sync way and satisfies all my production requirements at the moment.

Answers:

  1. Subsecond. Anywhere between 1-5 times per second. Data comes at a much faster rate though. I accumulate data in asyncio.Queues and dump them periodically to InfluxDB/MongoDB (depending on the type of data).
  2. Mostly LAN, but I wanted something that will work just as well if I eventually have move the databases to remote locations.
  3. Not that I am aware. I did get a lot of asyncio warning messages saying that it took over 100ms for the event loop to execute a given task (due to blocking calls not returning the control flow to the event loop) before switching to aioinflux. Now I rarely ever see such messages. They look like this:
Executing <Handle <TaskWakeupMethWrapper object at 0x7f56c04bea98> created at /home/gustavo/anaconda3/lib/python3.6/asyncio/tasks.py:384> took 0.119 seconds
  1. Writes happens 24/7. Queries are less often, but usually more intensive and happen whenever I or someone from my team is analyzing data.

gusutabopb avatar Jun 22 '17 03:06 gusutabopb

Thanks @RonRothman + @gusutabopb, appreciate the feedback. I've started working on this.

sebito91 avatar Jun 22 '17 12:06 sebito91

@sebito91 I also do the same work just like Mr.@gusutabopb I think at first we can implements a full test async lib and merge in official repo‘s branch CC @RonRothman Finally,this issue can be close

Zheaoli avatar Jun 22 '17 16:06 Zheaoli

Like what is alluded in #406, influxdb-python can be monkey patched by gevent to be made asynchronous. It is as simple as:

from gevent import monkey
monkey.patch_all()
from influxdb import InfluxDBClient

client = <..>
client.query(<..>)

Client queries are now asynchronous. This works on py2 and py3 and requires no code changes. While having an asyncio client would be great, asyncio only exists in Python 3.

Parallel queries can be run via gevent's greenlets, eg:

import gevent
<patch as above>

queries = (<query1>, <query2>, <..>)
cmds = [gevent.spawn(client.query, query)
        for query in queries]
gevent.joinall(cmds)

These parallel greenlets are co-operative with other greenlets rather than blocking on I/O. Similarly for writing via client.write_points et al.

It is, however, best practice to run groups of co-routines (like greenlets) of different purposes in separate native threads so, for example, greenlets for application A do not conflict with greenlets of application B. They will not block each other when reading/writing over the network but actual code still needs to be run on the CPU for everything else, which will still block other greenlets from executing.

Making network requests asynchronous makes for better use of resources - they do not magically make executing code use no resources at all.

pkittenis avatar Jul 19 '17 11:07 pkittenis

In case this is still active, you can use python-influxdb with python by using the loop.run_in_executor function using a ThreadPoolExecutor (though I would prefer an actual AsyncIO client).

from influxdb import InfluxDBClient

. . .

async def write_data(influx_client, data_points)
    loop = asyncio.get_event_loop()
    # Using None will create a ThreadPoolExecutor
    # you can also pass in an executor (Thread or Process)
    await loop.run_in_executor(None, influx_client.write_points, aggregates)

SakuraSound avatar Aug 28 '17 03:08 SakuraSound

I ported this library. It's work with aiohttp in back-end and it's compatible with asyncio. async-influxdb-python

Yaser-Amiri avatar Feb 17 '18 06:02 Yaser-Amiri

@sebito91 : Any news on this feature?

tejaskale avatar Nov 23 '18 14:11 tejaskale

Ideally any async support will be for multiple async libraries (e.g. trio, curio) and not just asyncio. aiohttp is great, but it locks you into asyncio. There are attempts being made right now to get urllib3 to support async via requests-core but work there seems to have stalled.

richardhanson avatar Jan 10 '19 22:01 richardhanson

To update the last comment: looks like the two most popular async-friendly HTTP clients in Python are now https://github.com/aio-libs/aiohttp/ and https://github.com/encode/httpx/. The latter happens to support both asyncio and trio.

It seems that @gusutabopb is not using aioinflux that much these days https://github.com/gusutabopb/aioinflux/issues/31#issuecomment-626729838, so having async support in the official client would be a great addition.

Until then, as it's been said here, the solution would be loop.run_in_executor.

astrojuanlu avatar Nov 23 '20 18:11 astrojuanlu

Hi All,

the influxdb-client-python supports async/await.

For more info see:

  • https://influxdb-client.readthedocs.io/en/stable/usage.html#how-to-use-asyncio
  • https://influxdb-client.readthedocs.io/en/stable/api_async.html

Regards

bednar avatar Jun 17 '22 05:06 bednar