influxdbr
influxdbr copied to clipboard
Rewrite of the query sub-system
This is still WIP, but I guessed it's time to talk. This piece is so fundamental that I had to touch pretty much every other part of the package. The good news is that most of the open issues are taken care by the rewrite and the code is bare-bones dependency.
After finishing the rewrite I have realized that, even if the new parsing part is 3-4x faster than before it's still not a tremendous improvement because the json parsing still takes 1/3 of the entire time. So I now think that it's good to go the csv route and ditch the json completely. The csv parsing with data.table is lightning, so the total time will be very close to the influx query bare minimum one can hope for.
The error issue with csv which I have mentioned on the other thread could be tackled by a repeated request. For example if there is an error with an empty error message, we issue same request in json and report the error content from the second request.
The benfits of csv are worthwhile - optimal speed, extremely simple code base, and no jsonlite dependency. WDUT?
I actually noticed a huge memory consumption which is not released on gc. I see it both with old and new bare-bones base R implementation of parsing. Hard to say where it comes from. It's not from jsonlite parse. It looks like a memory leak in base R, but I guess that's unlikely. With csv data.table reader memory consumption is trivial.
You did a great job! Thank you!
I like your proposal to change to csv parsing.
We could also determine the version of the queried InfluxDB server in the course of creating the influx_connection
object and adapt the parsing scheme (or raise a warning or ...).
Great work!
@dleutnant I had to take a long "leave" from this project as I wasn't using influx any longer.
Now I came back but I realized that the PR might not be the best way to go about it. The changes that I have made back then are massive and seems incompatible with the current implementation. Also very old versions of influx have a buggy csv implementation. Unfortunately I won't have the bandwidth to go into all those compatibility and influx version workarounds. Nor I think it's worth it.
So I would propose that we create a new package. A very lightweight, bare bones with only data.frames for writing and csv for parsing. I can pretty quickly prepare a proposal as most of the stuff has been isolated already. WDYT?
Hi @vspinu,
well, I recently started a new job and don't think that I'll continue to actively work on this package. This means, I am just fine with your proposal! Also, If you are interested I could give you "owner" rights to this repo which would make things easier I guess. Just let me know. Best, Dominik
@vspinu are you open to revisiting this at all? Now that influxdb 2 is out of beta and flux has matured a bit more , seems like good time to finalize your modifications. I'd be open to helping in any way.
@FixTestRepeat Yes. Things are almost done on my side but I have been quite busy lately. I have my local copy which I have been using successfully for the past half a year. I will try to publish the package by the end of the week.