influxdbr
influxdbr copied to clipboard
remove argument "return_xts" in influx_query? Add tsibble?
Currently, the influx_query
result is either a data.frame
(tibble
) or an xts
object. The latter is only the case, if the function call specifically contains the return_xts=TRUE
argument. However, this approach is strictly limited to the xts
time series class. With the recent development of the tsibble package, a powerful time series class is available which could be useful to incorporate into influxdbr
. Therefore, one idea is
- to remove the "return_xts" argument and always return a
data.frame
(tibble
), - add an unambiguous class (e.g. "influxdb_result") to the resulting
data.frame
and - add conversion functions
as_xts.influxdb_result
,as_tsibble.influxdb_result
.
This would allow the following chain:
influx_connection() %>%
influx_query() %>%
as_xts()
influx_connection() %>%
influx_query() %>%
as_tsibble()
This change would potentially break existing codes. What is the best-practice of deprecate function arguments?
What you propose makes sense to me.
I would add to this that internally there is no need to use json parser. Just request a csv and parse with read.csv
, readr::read_csv
or `data.table::fread. Fallback automatically to whichever library is present on the user's computer without adding hard dependencies. It's probably much faster this way and would be another step towards #54. If ok with you I can have a look.
Regarding the breaking change, I think it's best to just increment the major version of the package and announce that it's a breaking change. I am sure all users would appreciate this as the current return value is far from elegant.
If ok with you I can have a look.
Cool! That’d be great! Thank you! In this case I’ll postpone the CRAN submission.
The proposed change most likely would also address #48.
I have looked at it and there is a hiccup with older version of influxdb (1.1.1, current stable is 1.7.6). When the accepted output is application/csv and there is a query error which generates 400 status the content of the error is empty. So the user won't get an informative output.
For example both of the following would produce an empty output:
curl -v -H "Accept: application/csv" --data-urlencode "q=some-query" -G "http://localhost:8086/query"
curl -v -H "Accept: application/csv" --data-urlencode "qq=some-query" -G "http://localhost:8086/query"
I don't know which version of influx fixed this issue, but 1.1.1 is still the default on ubuntu 18.04, which is relatively recent OS. So I believe this is a show stopper for the csv parser.
The good news is that most of the time is spent by influx on building the response, so parsing csv instead of json on the R side has small speed benefits relatively speaking.
OK, let's stick to the json parser. (anyway: nice idea to reduce package dependencies!)
A heads up. I am very close to finishing this and reducing dependencies to 0 on both write and read. Will be back with a PR hopefully later today.
Wow! looking forward to it!
Actually there is a small hick up with multiple queries. Do you really want to merge multiple queries into one big data.frame? I think it would make sense to return a list of data.frames, one data.frame per query and let the user decide what to do with it.
I am currently using a custom rbind_list which would fall back on data.table::rbindlist, dplyr::rbind_list and then base::rbind, whichever is available on the user's system without an explicit dependency. Both data.table:rbindlist and dplyr::rbind can handle heterogenuous components but not base::rbind.
If you really want binding for multiple quieries then one posibility would be to fail with message to install data.table or dplyr in case the user uses multiple quieries in one request (which I think is pretty rare). What do you think?
... I think it would make sense to return a list of data.frames...
Agreed, as each single result might contain a different set of tags.
... and fields, of course.