influxdbr icon indicating copy to clipboard operation
influxdbr copied to clipboard

remove argument "return_xts" in influx_query? Add tsibble?

Open dleutnant opened this issue 6 years ago • 10 comments

Currently, the influx_query result is either a data.frame (tibble) or an xts object. The latter is only the case, if the function call specifically contains the return_xts=TRUE argument. However, this approach is strictly limited to the xts time series class. With the recent development of the tsibble package, a powerful time series class is available which could be useful to incorporate into influxdbr. Therefore, one idea is

  1. to remove the "return_xts" argument and always return a data.frame (tibble),
  2. add an unambiguous class (e.g. "influxdb_result") to the resulting data.frame and
  3. add conversion functions as_xts.influxdb_result, as_tsibble.influxdb_result.

This would allow the following chain:

influx_connection() %>% 
  influx_query() %>% 
  as_xts()

influx_connection() %>% 
  influx_query() %>% 
  as_tsibble()

This change would potentially break existing codes. What is the best-practice of deprecate function arguments?

dleutnant avatar Feb 05 '19 09:02 dleutnant

What you propose makes sense to me.

I would add to this that internally there is no need to use json parser. Just request a csv and parse with read.csv, readr::read_csv or `data.table::fread. Fallback automatically to whichever library is present on the user's computer without adding hard dependencies. It's probably much faster this way and would be another step towards #54. If ok with you I can have a look.

Regarding the breaking change, I think it's best to just increment the major version of the package and announce that it's a breaking change. I am sure all users would appreciate this as the current return value is far from elegant.

vspinu avatar Jun 23 '19 09:06 vspinu

If ok with you I can have a look.

Cool! That’d be great! Thank you! In this case I’ll postpone the CRAN submission.

dleutnant avatar Jun 24 '19 18:06 dleutnant

The proposed change most likely would also address #48.

dleutnant avatar Jun 24 '19 19:06 dleutnant

I have looked at it and there is a hiccup with older version of influxdb (1.1.1, current stable is 1.7.6). When the accepted output is application/csv and there is a query error which generates 400 status the content of the error is empty. So the user won't get an informative output.

For example both of the following would produce an empty output:

 curl -v -H "Accept: application/csv" --data-urlencode "q=some-query" -G "http://localhost:8086/query"
 curl -v -H "Accept: application/csv" --data-urlencode "qq=some-query" -G "http://localhost:8086/query"

I don't know which version of influx fixed this issue, but 1.1.1 is still the default on ubuntu 18.04, which is relatively recent OS. So I believe this is a show stopper for the csv parser.

The good news is that most of the time is spent by influx on building the response, so parsing csv instead of json on the R side has small speed benefits relatively speaking.

vspinu avatar Jun 24 '19 19:06 vspinu

OK, let's stick to the json parser. (anyway: nice idea to reduce package dependencies!)

dleutnant avatar Jun 24 '19 19:06 dleutnant

A heads up. I am very close to finishing this and reducing dependencies to 0 on both write and read. Will be back with a PR hopefully later today.

vspinu avatar Jun 26 '19 11:06 vspinu

Wow! looking forward to it!

dleutnant avatar Jun 26 '19 11:06 dleutnant

Actually there is a small hick up with multiple queries. Do you really want to merge multiple queries into one big data.frame? I think it would make sense to return a list of data.frames, one data.frame per query and let the user decide what to do with it.

I am currently using a custom rbind_list which would fall back on data.table::rbindlist, dplyr::rbind_list and then base::rbind, whichever is available on the user's system without an explicit dependency. Both data.table:rbindlist and dplyr::rbind can handle heterogenuous components but not base::rbind.

If you really want binding for multiple quieries then one posibility would be to fail with message to install data.table or dplyr in case the user uses multiple quieries in one request (which I think is pretty rare). What do you think?

vspinu avatar Jun 26 '19 11:06 vspinu

... I think it would make sense to return a list of data.frames...

Agreed, as each single result might contain a different set of tags.

dleutnant avatar Jun 26 '19 12:06 dleutnant

... and fields, of course.

dleutnant avatar Jun 26 '19 12:06 dleutnant