influxdbr Rewrite of the query sub-system

This is still WIP, but I guessed it's time to talk. This piece is so fundamental that I had to touch pretty much every other part of the package. The good news is that most of the open issues are taken care by the rewrite and the code is bare-bones dependency.

After finishing the rewrite I have realized that, even if the new parsing part is 3-4x faster than before it's still not a tremendous improvement because the json parsing still takes 1/3 of the entire time. So I now think that it's good to go the csv route and ditch the json completely. The csv parsing with data.table is lightning, so the total time will be very close to the influx query bare minimum one can hope for.

The error issue with csv which I have mentioned on the other thread could be tackled by a repeated request. For example if there is an error with an empty error message, we issue same request in json and report the error content from the second request.

The benfits of csv are worthwhile - optimal speed, extremely simple code base, and no jsonlite dependency. WDUT?

Jun 27 '19 12:06 vspinu

I actually noticed a huge memory consumption which is not released on gc. I see it both with old and new bare-bones base R implementation of parsing. Hard to say where it comes from. It's not from jsonlite parse. It looks like a memory leak in base R, but I guess that's unlikely. With csv data.table reader memory consumption is trivial.

Jun 27 '19 13:06 vspinu

You did a great job! Thank you! I like your proposal to change to csv parsing. We could also determine the version of the queried InfluxDB server in the course of creating the influx_connection object and adapt the parsing scheme (or raise a warning or ...).

Jul 01 '19 14:07 dleutnant

Great work!

Jul 01 '19 23:07 DMerch

@dleutnant I had to take a long "leave" from this project as I wasn't using influx any longer.

Now I came back but I realized that the PR might not be the best way to go about it. The changes that I have made back then are massive and seems incompatible with the current implementation. Also very old versions of influx have a buggy csv implementation. Unfortunately I won't have the bandwidth to go into all those compatibility and influx version workarounds. Nor I think it's worth it.

So I would propose that we create a new package. A very lightweight, bare bones with only data.frames for writing and csv for parsing. I can pretty quickly prepare a proposal as most of the stuff has been isolated already. WDYT?

Nov 12 '20 10:11 vspinu

Hi @vspinu,

well, I recently started a new job and don't think that I'll continue to actively work on this package. This means, I am just fine with your proposal! Also, If you are interested I could give you "owner" rights to this repo which would make things easier I guess. Just let me know. Best, Dominik

Nov 13 '20 09:11 dleutnant

@vspinu are you open to revisiting this at all? Now that influxdb 2 is out of beta and flux has matured a bit more , seems like good time to finalize your modifications. I'd be open to helping in any way.

Mar 08 '21 09:03 FixTestRepeat

@FixTestRepeat Yes. Things are almost done on my side but I have been quite busy lately. I have my local copy which I have been using successfully for the past half a year. I will try to publish the package by the end of the week.

Mar 08 '21 10:03 vspinu

influxdbr influxdbr copied to clipboard

Rewrite of the query sub-system

influxdbr
influxdbr copied to clipboard