influxdb-cxx icon indicating copy to clipboard operation
influxdb-cxx copied to clipboard

query perfomance and points writing order

Open ILENIMARIUS opened this issue 4 years ago • 6 comments

Hello, I am writing 2 points with every "for" iteration and query for points database using time start of write and time start of stop write. Everything works fine but the query response is realy slow compared to writing. For instance: tstart For ( 1 < 10000 ) { write -> point { A } write -> point { B } } tstop then i query all points in time interval -> SELECT * FROM /.*/ WHERE TIME >= (tstart) + " AND TIME < (tstop)

I manage to write (HTTP), 20000 points in total ( A and B ) in 200ms and manage to retrieve them in 1.7s which is almost 10 times slower. This ratio stays no matter how many points. How can i improve this query time ? And second: I write 10000 points of A and 10000 points of B one by one but when using influx command line and select all, the result is sequential, first 10000 (A) points followed by 10000 (B) points with mixed database timestamp.

It should be timeseries: timestamp1 - A timestamp2 - B timestamp3 - A timestamp4 - B

Why is that?

ILENIMARIUS avatar Apr 14 '20 09:04 ILENIMARIUS

Hi, Regarding query time, I added a quick check and even CI machine provides result within 20ms. Regarding ordering it is InfluxDB's default behaviour to return results per series. Are you getting different result when using command line tool?

awegrzyn avatar Apr 14 '20 21:04 awegrzyn

Hello, First, thank you for your message. The query response is within 20ms but the actual data retrieved in time is the problem. I checked and i think the bottleneck is somewhee after the data is retrieved from db and a post processing of data is occurring. We have like a data tree (or json type) and in the post processing we retrieve and parse the data. Example of Raw extracted data with another library {https://github.com/orca-zhang/influxdb-cpp} : {"results":[{"statement_id":0,"series":[{"name":"Signal1","columns":["time","Data","Timestamp"],"values":[["2020-04-15T06:40..... I have tested with this library mention above, which is making a query and everything is stored in a string The perfomance of reading is outstanding but after reading we have to do a post processing of a json like type data which is adding some time. Even with this extra time i am able to read and process much faster that i write with the influxdb-cxx library. Results: Influxdb-cxx write: 1000.000 points with the name .field timestamp -> chrono timestamp .field data -> 23 char string write: duration - 9849 miliseconds

influxdb-cpp -> read all from all time: DATA REQUEST: duration - 6103 miliseconds extract all values and put them in a vector: "values":[["2020-04-15T06:40:44.818300968Z","FF:4D:30:30:0F:0D:0D:FF","1585559900000000000"] DATA PROCESSING: duration - 2843 miliseconds VECTOR SIZE: 3000000 (each point has 3 values : 1.timestamp database 2.chrono timestamp 3.data string) Can we optimize something to increase the reading of influxdb-cxx?

ILENIMARIUS avatar Apr 15 '20 12:04 ILENIMARIUS

Hello Again. Looking through code i found that a bottleneck is in /master/src/InfluxDB.cxx from line 90 to 119. For instance using stringstream is time consuming. Some of the series parsing there can be optimized. I will let you know if i have more details.

ILENIMARIUS avatar Apr 16 '20 18:04 ILENIMARIUS

Indeed this should be definitely optimised...

awegrzyn avatar Apr 18 '20 08:04 awegrzyn

Actually I did check performance and I got:

  • 10k write: 116ms
  • 10k read: 195ms

awegrzyn avatar Apr 18 '20 09:04 awegrzyn

Hello Awegrzyn, Because is not linear, Try like this: 10k batches. write: the same point for 1000.000 times with 2-3 fields as fast as possible. read: select* from "db" from start of writing time till end. write/read ratio is somewhere between 1/5-1/9 For instance I write 1mill points in aprox. 30s and i go in timeout trying to read them. Eliminating timout, read takes aprox 289s

ILENIMARIUS avatar Apr 21 '20 06:04 ILENIMARIUS