influxdbr icon indicating copy to clipboard operation
influxdbr copied to clipboard

Rewrite of the query sub-system

Open vspinu opened this issue 5 years ago • 7 comments

This is still WIP, but I guessed it's time to talk. This piece is so fundamental that I had to touch pretty much every other part of the package. The good news is that most of the open issues are taken care by the rewrite and the code is bare-bones dependency.

After finishing the rewrite I have realized that, even if the new parsing part is 3-4x faster than before it's still not a tremendous improvement because the json parsing still takes 1/3 of the entire time. So I now think that it's good to go the csv route and ditch the json completely. The csv parsing with data.table is lightning, so the total time will be very close to the influx query bare minimum one can hope for.

The error issue with csv which I have mentioned on the other thread could be tackled by a repeated request. For example if there is an error with an empty error message, we issue same request in json and report the error content from the second request.

The benfits of csv are worthwhile - optimal speed, extremely simple code base, and no jsonlite dependency. WDUT?

vspinu avatar Jun 27 '19 12:06 vspinu

I actually noticed a huge memory consumption which is not released on gc. I see it both with old and new bare-bones base R implementation of parsing. Hard to say where it comes from. It's not from jsonlite parse. It looks like a memory leak in base R, but I guess that's unlikely. With csv data.table reader memory consumption is trivial.

vspinu avatar Jun 27 '19 13:06 vspinu

You did a great job! Thank you! I like your proposal to change to csv parsing. We could also determine the version of the queried InfluxDB server in the course of creating the influx_connection object and adapt the parsing scheme (or raise a warning or ...).

dleutnant avatar Jul 01 '19 14:07 dleutnant

Great work!

DMerch avatar Jul 01 '19 23:07 DMerch

@dleutnant I had to take a long "leave" from this project as I wasn't using influx any longer.

Now I came back but I realized that the PR might not be the best way to go about it. The changes that I have made back then are massive and seems incompatible with the current implementation. Also very old versions of influx have a buggy csv implementation. Unfortunately I won't have the bandwidth to go into all those compatibility and influx version workarounds. Nor I think it's worth it.

So I would propose that we create a new package. A very lightweight, bare bones with only data.frames for writing and csv for parsing. I can pretty quickly prepare a proposal as most of the stuff has been isolated already. WDYT?

vspinu avatar Nov 12 '20 10:11 vspinu

Hi @vspinu,

well, I recently started a new job and don't think that I'll continue to actively work on this package. This means, I am just fine with your proposal! Also, If you are interested I could give you "owner" rights to this repo which would make things easier I guess. Just let me know. Best, Dominik

dleutnant avatar Nov 13 '20 09:11 dleutnant

@vspinu are you open to revisiting this at all? Now that influxdb 2 is out of beta and flux has matured a bit more , seems like good time to finalize your modifications. I'd be open to helping in any way.

FixTestRepeat avatar Mar 08 '21 09:03 FixTestRepeat

@FixTestRepeat Yes. Things are almost done on my side but I have been quite busy lately. I have my local copy which I have been using successfully for the past half a year. I will try to publish the package by the end of the week.

vspinu avatar Mar 08 '21 10:03 vspinu