influxdb
influxdb copied to clipboard
Empty prefix alias for wildcards aggregate functions
Moving and downsampling data between databases or retention policies should allow us to maintain the field names.
I tried
SELECT last(*) AS ""
INTO "mfcf_vmware"."autogen".:measurement
FROM /^vmware_.*/
WHERE time >= '2016-01-01T00:00:00Z'
GROUP BY time(30m), "host"
but sadly because the prefix key is empty it fallbacks to "last_" in this case.
https://github.com/influxdata/influxdb/pull/7009 @jsternberg would adding this be simple? @gunnaraasen should something different be done so it uses backreferences?
Unfortunately, this wouldn't be easy. We needed some way to uniquely identify the result and there had to be a prefix of some kind to the AS
. If you have any recommendations, we can try to make this kind of thing possible. What is the desired field name you want out of that query?
What is the desired field name you want out of that query?
The original name, anything else it going to be an issue because tools like grafana would need to be updated to use the new field names... and this is an issue because I want to allow the users to jump between the different retention/rollups without worry about field name changing
The only workaround i see, would be to do an export then use sed
to update the values and preform a restore.
/not user-friendly/
untill this is resolved I have little use for continuous queries
please add this to a milestone. its kinda required for anything /enterprise/ level unless im missing something fundamental?
The original name is likely not possible. It creates a conflict if you do something like SELECT first(*), last(*) FROM cpu
since now you have two wildcards that create the same field name. From your mention of Grafana, I think your issue might be relevant to #7198. Does it look like that issue covers yours?
i don't see a conflict here
SELECT first(*), last(*) FROM cpu
first_idle first_user last_idle last_user
this creates fields with prefix first_ and last_
if you did
SELECT first(*) AS "x", last(*) AS "x" FROM cpu
error
then yes you'll have a conflict
if you did
SELECT first(*) AS "x", last(*) AS "y" FROM cpu
x_idle x_user y_idle y_user
then its good
if you did
SELECT first(*) AS "", last(*) AS "y" FROM cpu
first_idle first_user y_idle y_user
WOULD PREFER: idle user y_idle y_user
it should also be OK... if the preferred outcome was the result.
Given an example where cpu
has the fields idle
and user
, what do you think the output column names should be for each of the above?
Just a warning. At the moment, it's unlikely we will physically be capable of changing this since I think it would be a backwards incompatible change.
updated above.
also i don't see this breaking any backwards compatible because if you use an empty alias then you would not have used an alias in this current state.
cc @jwilder since this is a breaking change. @steverweber we, unfortunately, will likely not be able to change the current behavior. If you have any proposal for a way to make your current use case work without changing the existing behavior, we can look into that. Can you look at #7198 and see if it's relevant to your problem? Thanks.
Another workaround to add an empty prefrix. Add something like the CQ back reference to AS
Yes i looked at #7198 its on an interesting path. I was hoping to see something like this and glad its on the radar.. however it's not the solution i'm looking for (im trying to copy data between two different databases and roll up the data)... However I could just copy all the data without the group by and wait for a clean solution in another release.
SELECT *
INTO "mfcf_vmware"."autogen".:measurement
FROM /^vmware_.*/
I think I may have misunderstood the issue. I think this might be possible and may not be a breaking change. I thought this would require a change in what the alias modified since, for whatever reason, I thought the alias modified the field name, but the alias modifies the function name.
I'll see if we can do this.
+1 any news on this?
Why I want this:
- Reusing the same queries with multiple retention policies e.g. in Grafana
Full story on a simplified example:
- I have e.g. three retention policies "raw", "store30d" and "store1y"
- Telegraf pushes data to the "raw" RP; This RP only covers the past 7d
- A continuous query aggregates data from the "raw" RP and stores it in "store30d" (e.g.
SELECT mean(*), max(*) AS max, min(*) AS min INTO store30d.:MEASUREMENT FROM raw./.+/ GROUP BY time(1m),*)
- More aggregated data goes into "store1y"
SELECT mean(*),... INTO store30d.:MEASUREMENT FROM raw./.+/ GROUP BY time(1h),*)
- If I don't give an alias, the function name is prefixed to the field name. That's as expected
- If I do give an alias, the alias name is prefixed to the field name. That's nice, too
- But I didn't find a way to keep the original field name other than by using explicit field names and aliases instead of the asterisk. Problem: This doesn't scale. I'd then have 50 different CQs that need to be maintained.
Current workaround:
In Grafana I use the retention policiy as a template parameter. Currently I have multiple queries with the different field names (usage_percentile, mean_usage_percentile), but that approach puts at least 50% more queries to influxdb that would be needed.
Potential Solution
I like @steverweber's initial approach of using an empty alias. IMHO, it's OK if the query would fail if the field names aren't unique anymore, but as long as the querier makes up queries that result in unique field names, this feature would be very helpful. Proposals for syntax
-
SELECT mean(*) AS "" FROM raw.mem GROUP BY time(1h)
-
SELECT mean(*) NOALIAS FROM raw.mem GROUP BY time(1h)
-
SELECT mean(*) AS FIELDNAME FROM raw.mem GROUP BY time(1h)
Shall all return the same field names as in the raw.mem series.
I'm not sure if SELECT first(*) - last(*) FROM mem
should work (returning differences per field (e.g. first_used_percent-last_used_percent, first_active-last_active,...)
@bbczeuz thanks for spelling out the issue.
https://github.com/influxdata/influxdb/issues/7858
another syntax that would be ok:
SELECT mean(*) AS :FIELD FROM measurentname
If it is small change then releasing it sooner will be an high impact one for us :+1: . Any ETA would be great @jsternberg
Like many others in this thread, we also feel limited by the use of CQ's and retention policies. Very nicely spelled out by @bbczeuz .
I'd like to add that the solution could be implemented in such a way that many other issues regarding alias names can be solved as well. When not dealing with CQ's at all, but just defining a SELECT mean(*) from "mymeasurement"
in Grafana, gives a lot of unwanted pollution in the legend of the graph, and its tooltip.
I also saw some discussion about adding prefixes or postfixes to the column names. This only increases the pollution... column names will get longer and longer.
Sometimes we want to fan out a measurement, with shorter column names and a more descriptive new measurement name.
How about this for a solution?
Use a two-parameter AS phrase to enter regex replace alias mode
For example
SELECT mean(*) AS "^(mean_)(MyOld)(.*)$" "MyNew_$3" FROM measurementname
Cheers, TW
@jsternberg will this issue be considered for implementation?
I guess this one is probably easier in implementation than #7198, but will help quite a lot to those who use RP-CQ pattern of downsampling to access raw and downsampled data in the uniform way.
And last proposed syntax from @steverweber: SELECT mean(*) AS :FIELD FROM measurentname
looks elegant.
While a @tw-bert -ish solution is clearly more powerful:
SELECT mean(*) WITH "^(mean_)(MyOld)(.*)$" AS "MyNew_$3" FROM measurementname
WITH <match> AS <replace>
could appear multiple times. Is this SQL-ish? or is WITH
confusing?
I think @steverweber 's solution would still fit most cases (and is easier to understand and probably easier to implement and faster to run).
A community user ran into this reciently: https://community.influxdata.com/t/query-problem-removing-prefix-to-field-name-of-into-clause/1006
+1
I'm setting up downsampled dashboards in Grafana and the only option I see is having one panel for the raw data, and one panel for the downsampled data to accomodate the sum_ or mean_ prefix.
I see the reasoning for having the prefixes:
SELECT sum(*), mean(*) INTO "metrics"."one_day".:MEASUREMENT FROM /.*/ GROUP BY time(1d),*
Selecting from one_day, there'd be no way to know if it was the sum or the mean field. Perhaps another way is having the aggregate in the retention policy. 'one_day_sum' and 'one_day_mean'.
This still requires a way to SELECT INTO the retention policy without the prefix. The create continuous query command would need to error if you tried creating a continuous query like the one I've got above; only allowing one aggregate function per retention policy if you specify the ideal 'no prefix' option.
Clear that select sum(*), mean(*) ...
will not work in this case, however I would work with AS smth
, like AS :FIELD
or AS "MyNew_$3"
as it was proposed before instead of prohibiting the syntax with multiple aggregate functions. Therefore if you would need to make both sum and mean you can add your own prefixes. Because in some cases there is a need of using different aggregate functions in one query for fields of different datatypes, like mean() will make no sense for a string data, probably you would use mode() instead, same for boolean values.
Is there anything happening here? I am trying to find a way to properly implement downsampling/aggregation with is - in my opinion - a core feature of a time series database.
After quite some reading and research I implemented different retention policies and continuous queries to downsample the data. Just to find out that the is no option to keep the field name? Downsampling in influxdb is already quite painful to implement and there is no way to do an intelligent rollup (I know there is an open issue for that). But this has nothing to do with downsampling when the data looks completely different afterwards.
Since there has been no activity around that issue I assume it is not really important for the devs. For our projects, this issue renders influxdb basically useless since we HAVE to downsample the data because of the sheer amount. But building a separate dashboard for every level of aggregation is not an acceptable solution.
@Crapworks We have similar requirements. We resorted to creating our own CQ generator, which gives us plenty of downsampling possibilities. So we took the long way of circumventing this issue, but in turn, it gave us greater flexibility. I can recommend doing it like that (but still: I'm not dismissing the need of a fix for this issue for anyone of course).
@tw-bert So if I get you right you are creating different CQs for every table in a database and expanding the field like mean(field_name1) as field_name1, mean(field_name2) as field_name2
? Just out of curiosity, how many CQs do you have running?
I've been watching this as well. If you're using Grafana to view your data, you can use regular expressions in the field name to get around the raw data vs aggregated granularity issue.
SELECT last(/(sum_)?partition_records/) FROM "$granularity"."measurement" --rest of query ---
This assumes you create a template variable ($granularity) with 'show retention policies on ...'.
@ewillia1983 I didn't know that, thank you! This is indeed helping a lot in dashboard creation!
Still, this is a workaround for Grafana and the main issue should still be addressed by InfluxDB
@ewillia1983 Interesting. I suppose it comes with a little bit of a performance hit though.
@Crapworks Yes, that's the idea. But sometimes you need more specific CQ's, for example a max() on one field, and no function at all on an accompanying text field. We run about 80 CQ's atm. Some are for downsampling, and some are for precalculations like these: https://stackoverflow.com/questions/38016051/how-do-i-get-consistent-values-with-influxdb-non-negative-derivative/43200947#43200947
@kerams - Good thought, I have not tested it. I just came up with this a few days ago, I've only got one panel switched over so far.
Any news on this long standing issue ? It is really painful to use CQs with this current behavior on field names.
Being forced to modify every metric name in every graph in every dashboard for Grafana is ... not enjoyable to say the least.
Is it really possible that such basic functionality for timeDB is not after 2 years still implemented?