grafana icon indicating copy to clipboard operation
grafana copied to clipboard

$__timeGroup should respect the browser time-zone to align groups

Open bminer opened this issue 3 years ago • 22 comments

What happened:

Time columns using $__timeGroup (and friends) are grouped according to UTC time boundaries.

Example query:

SELECT
  $__timeGroupAlias(created_at,$__interval,0),
  COUNT(*) AS "Created"
FROM sales_order_lines
WHERE $__timeFilter(created_at)
GROUP BY 1
  • Assume $__interval is set to 12h
  • Assume user selects time range between 2022-05-25 12:00:00 AM and 2022-05-31 11:59:59 PM
  • Assume user's timezone is US/Eastern -4:00 UTC

Given this, Grafana converts this query to the following:

SELECT
  UNIX_TIMESTAMP(created_at) DIV 43200 * 43200 AS "time",
  COUNT(*) AS "Created"
FROM sales_order_lines SOL
WHERE created_at BETWEEN FROM_UNIXTIME(1653451200) AND FROM_UNIXTIME(1654055999)
GROUP BY 1

Although UNIX_TIMESTAMP will correctly create 12-hour time buckets, the bucket boundaries will be rounded to UTC time boundaries. The time buckets will therefore be:

Database Raw Value UTC +0:00 US/Eastern -4:00 UTC
1653436800 2022-05-25 00:00:00 2022-05-24 20:00:00
1653480000 2022-05-25 12:00:00 2022-05-25 08:00:00
1653523200 2022-05-26 00:00:00 2022-05-25 20:00:00
1653566400 2022-05-26 12:00:00 2022-05-26 08:00:00
... ... ...
1653912000 2022-05-30 12:00:00 2022-05-30 08:00:00
1653955200 2022-05-31 00:00:00 2022-05-30 20:00:00
1653998400 2022-05-31 12:00:00 2022-05-31 08:00:00
1654041600 2022-06-01 00:00:00 2022-05-31 20:00:00

Because of this, the Grafana client in the web browser (in US/Eastern) will render 2022-05-24 20:00 as the first time bucket, and while that contains the correct values for that 12-hour time bucket, the user might expect to see a time bucket starting at 2022-05-25 00:00, as requested. Additionally, for the 7 days of data requested, the user will get 15 12-hour time buckets instead of the expected 14.

What you expected to happen:

Time columns should be grouped to respect the web browser's timezone. In our example, the first time bucket should start on 2022-05-25T04:00:00Z, and the last time bucket should start on 2022-05-31T16:00:00.000Z. (Note: these dates are in UTC time to avoid confusion)

How to reproduce it (as minimally and precisely as possible):

This bug only occurs under these conditions:

  • $__timeGroup macro is used in a SQL query
  • The SQL database and web browser running the Grafana client have differing timezones
  • The time interval is larger than 1 hour
  • The timezone difference between the web browser and the database is not evenly divisible by the time interval (thus causing misalignment of the grouped time interval). In our example, since the database and web browser client timezones are 4 hours apart, $__interval of 2h and 4h works just fine.

Anything else we need to know?:

This is solely an issue with the macros $__timeGroup, $__timeGroupAlias, et. al. The $__timeFilter macro correctly filters data according to the correct timestamp.

To be clear, the data returned by MySQL is correct and rendered correctly by Grafana; however, the time buckets are confusing to the end user under the conditions described above.

Environment:

  • Grafana version: 9.0.1
  • Data source type & version: MySQL 8.0.23; UTC timezone
  • OS Grafana is installed on: Ubuntu Linux 20.04
  • User OS & Browser: Linux Mint 20.3; Firefox 101.0.1; US/Eastern timezone
  • Grafana plugins: Only pre-installed core plugins

Possible Solution:

I'm not quite sure how to solve this problem just yet, but I think a good start is sending the timezone offset information to MySQL to ensure grouping occurs properly. The $__timeGroup(created_at, $__interval) macro could then expand to:

(UNIX_TIMESTAMP(created_at) + -14400) DIV 43200 * 43200 - -14400

where -14400 is the web browser's timezone difference from UTC in seconds and 43200 is the $__interval of 12h.

EDIT: As discussed below, a better proposed solution is:

(UNIX_TIMESTAMP(created_at) - 1653451200) DIV 43200 * 43200 + 1653451200

where 1653451200 is the "from date" UNIX timestamp (the same value in the WHERE clause) and 43200 is the $__interval of 12h.

A workaround for this might look like this:

(UNIX_TIMESTAMP(created_at) - UNIX_TIMESTAMP('${__from:date}')) DIV ($__interval_ms / 1000) * ($__interval_ms / 1000) + UNIX_TIMESTAMP('${__from:date}')

bminer avatar Jun 28 '22 21:06 bminer

Hey @bminer!

Thank you for opening this issue. It can get, indeed, confusing to see your intervals offsetted by the timezone in this case. We're looking into this and seeing what are our options and will return when we have something.

mdvictor avatar Jul 27 '22 05:07 mdvictor

Hey @bminer,

Can you please tell me if subtracting the value straight from the SQL query might be a possible solution to you? And if not, why? Something like this:

SELECT
  $__timeGroup(created_at,$__interval,0) + 14400 as time,
  COUNT(*) AS "Created"
FROM sales_order_lines
WHERE $__timeFilter(created_at)
GROUP BY 1

mdvictor avatar Aug 02 '22 12:08 mdvictor

It's an interesting solution, but I can foresee a few issues. For one, I'd prefer if the web browser's timezone offset to UTC (i.e. 14400) was not hard-coded in the SQL. It should be a Grafana variable at the very least.

Also, simply using $__timeGroup(created_at,$__interval,0) + 14400 will not work correctly because this expands to:

UNIX_TIMESTAMP(created_at) DIV 43200 * 43200 + 14400

...which is almost correct. Sadly, while the labels will be correct, the data values themselves will not be in the proper time interval. You really need something more like:

(UNIX_TIMESTAMP(created_at)  - 14400) DIV 43200 * 43200 + 14400

The calculation comes down to:

  1. Convert timestamp to milliseconds since Jan 1, 1970 UTC.
  2. Convert to browser timezone
  3. Group by time interval
  4. Convert back to UTC (since Grafana expects all time values to be UTC)

The current methodology only does steps 1 and 3. Thus, data is confusingly grouped into UTC time intervals. The solution you described only does steps 1, 3, and 4. Data is still grouped into UTC time intervals, but now those time intervals are mislabeled.

bminer avatar Aug 02 '22 18:08 bminer

Hmm... I wonder if InfluxDB works the same way as SQL data sources... is grouping always done in UTC time intervals?

EDIT: I researched this a bit. In InfluxDB (both InfluxQL and Flux query languages), time intervals are grouped based on the start/stop time, and since Grafana converts the start/stop time from browser time to UTC, grouping is done correctly. In SQL there is no such concept, so grouping is "somewhat broken" as I have described.

bminer avatar Aug 02 '22 18:08 bminer

I'm not sure the second step in the calculation, the conversion to browser timezone, is the way to go because all the data we send to the back-end and work with is in UTC.

I have an example in mind:

If the time range is set with a from date of 2022-05-25 00:00:00 UTC, and the timezone is set to -04:00, this results in a from being 2022-05-24 20:00:00 with the set timezone visible in Grafana.

When the query is sent to the backend, we send a from timestamp in UTC form, resulting in 2022-05-25 00:00:00, which will be used in the SQL query. Furthermore, all calculations on the back-end are done in UTC.

If we have a row in the database with a createdAt column at 2022-05-25 01:00:00 and apply the formula you proposed above, we will first subtract 4 hours from that date, apply the DIV function and then multiply again by the interval. This will result in a 12-hour bucket at 2022-05-24 12:00:00 to which we add the 4 hours again which will be outside the from timestamp used in the filtering clause WHERE created_at BETWEEN FROM_UNIXTIME(from) AND FROM_UNIXTIME(to).

Alas, adding x hours to the timeGroup as I presented above will also return wrong values if x is larger than the $__interval, in this case 12 hours, since it will erroneously offset some values outside the to time range limit, but I think it works as a workaround if we keep in mind this constraint.

Can you please further explain what you meant by data values being in wrong time intervals? Since we work only with UTC timestamps on the back-end calculations, and the time range is also sent as UTC timestamps, values will be in their correct buckets. Am I missing something here?

mdvictor avatar Aug 03 '22 07:08 mdvictor

If we have a row in the database with a createdAt column at 2022-05-25 01:00:00 and apply the formula you proposed above, we will first subtract 4 hours from that date, apply the DIV function and then multiply again by the interval. This will result in a 12-hour bucket at 2022-05-24 12:00:00 to which we add the 4 hours again which will be outside the from timestamp used in the filtering clause WHERE created_at BETWEEN FROM_UNIXTIME(from) AND FROM_UNIXTIME(to).

Yeah, nice catch! If the time interval is 12 hours, and you select 2022-05-24 20:00 -04:00 as the start time in Grafana, the user would likely expect the first time interval to start at 2022-05-24 20:00 -04:00. The next interval should be 12h later at 2022-05-25 08:00 -04:00.

In other words, I think users might expect time intervals to be aligned to the $__from timestamp... like this:

(UNIX_TIMESTAMP(created_at)  - UNIX_TIMESTAMP('2022-05-25 00:00:00') ) DIV 43200 * 43200 + UNIX_TIMESTAMP('2022-05-25 00:00:00')

where 2022-05-25 00:00:00 is basically the $__from variable. I'm pretty sure this is how InfluxDB works (although someone should confirm this). If so, maybe this is the best approach (for the sake of consistency). Queries can already be written this way without $__timeGroupAlias:

(UNIX_TIMESTAMP(created_at) - UNIX_TIMESTAMP('${__from:date}')) DIV ($__interval_ms / 1000) * ($__interval_ms / 1000) + UNIX_TIMESTAMP('${__from:date}') AS "time"

The only issue I can foresee here is when you select "Last 15 days" in Grafana, and the $__from becomes something more convoluted like 2022-07-15 13:47:23 -04:00. If the interval is 12h, should the first time bucket start at 2022-07-15 13:47:23 -04:00? (I think so) Or perhaps 2022-07-15 12:00:00 -04:00 (i.e. truncating to nearest 12h time interval)? Or (using my proposed solution), is 2022-07-15 04:00 -04:00 okay?

As mentioned above, another idea is to truncate $__from to the nearest time interval. So, for example, if the time interval is 4h, you'd take the $__from timestamp and round down to the appropriate 4h interval in the day. For example, when $__from is 2022-07-15 13:47:23 -04:00, you'd start the first time interval at 2022-07-15 12:00:00 -04:00, but importantly, that WHERE clause is unaffected, so the bucket would not include data before the $__from timestamp. That's rather confusing IMO.

This is more interesting of a problem that I originally thought. :-) Please let me know your thoughts.

bminer avatar Aug 03 '22 17:08 bminer

TLDR; please ignore my last comment. ;-) After some more thought, I'm pretty sure that the best solution is aligning the time intervals with the $__from date.

Current workaround looks like this:

(UNIX_TIMESTAMP(created_at) - UNIX_TIMESTAMP('${__from:date}')) DIV ($__interval_ms / 1000) * ($__interval_ms / 1000) + UNIX_TIMESTAMP('${__from:date}') AS "time"

What do you think of changing $__timeGroup et. al. to work this way?

bminer avatar Aug 03 '22 17:08 bminer

Hey @bminer !

You've proposed some interesting solutions here, but I am not sure that users really expect the interval to begin from the beginning of the set time range. The timeGroup macro is, in the end, rather independent from the time range and should remain so. You've given some very good arguments above as to why that is. It can become quite confusing to make the timeGroup macro depend on the time range. Every time a user would change the time range he would get new buckets based on from. I think users expect timeGroup to return fixed hour intervals in an independent way.

Modifying the macro in such a way would mean adding extra complexity and manipulations of the SQL query that might not be desired. I believe that the better way is for a user to manually edit the query and reach the desired output as I proposed before:

SELECT
  $__timeGroup(created_at,$__interval,0) + 14400 as time,
  COUNT(*) AS "Created"
FROM sales_order_lines
WHERE $__timeFilter(created_at)
GROUP BY 1

mdvictor avatar Aug 04 '22 13:08 mdvictor

The more I think about this, the more confused I become. ;-)

I am not sure that users really expect the interval to begin from the beginning of the set time range. The timeGroup macro is, in the end, rather independent from the time range and should remain so.

Yeah, I agree, actually. Previously, you said:

Alas, adding x hours to the timeGroup as I presented above will also return wrong values if x is larger than the $__interval, in this case 12 hours, since it will erroneously offset some values outside the to time range limit, but I think it works as a workaround if we keep in mind this constraint.

Yeah, this is true, but maybe this is an acceptable compromise. Maybe the user can decide between if they want aggregation in their timezone or in UTC (i.e. via an additional argument to the $__timeGroup macro, for example).

I believe that the better way is for a user to manually edit the query and reach the desired output as I proposed before

I still don't like this solution because it hides that fact that the time interval grouping / aggregation is performed on the UTC timestamp. Also, I tried this workaround, and it currently doesn't work when the query "Format As" is set to "Time Series" mode. Grafana makes some changes to the resultset to keep things aligned with the wrong time interval, I think. When you switch to "Table" mode, it works fine but then the joining of points doesn't work as expected.

I dug into how InfluxDB handles this issue, and apparently Flux now support aggregations for timezones. Documentation on this is sparse, but they did write a blog post about it. Quoting the blog post (emphasis mine):

But…what happens when we start to aggregate data for other purposes? What if I want to aggregate by days, weeks, or months and I want these aggregations to be in my own time zone for a business reason? I don’t live in the UK and, even if I did, the UK isn’t on UTC time the entire year anyway. I can just pretend and remember that my aggregations aren’t 100% accurate all the time and they are “good enough”. Sometimes “good enough” isn’t good enough. In those, I can get closer by using an offset to shift the time to my time zone. But, when daylight savings time happens, I might be back to “good enough” aggregation by time.

Anyway, it is a little more difficult to replicate this sort of thing in SQL. We haven't even discussed the issue of Daylight Saving Time. It's starting to look a little crazy:

CONVERT_TZ(
  FROM_UNIXTIME(
    UNIX_TIMESTAMP(CONVERT_TZ(created_at,'UTC','US/Eastern'))
    DIV ($__interval_ms / 1000) * ($__interval_ms / 1000)
  ),
'US/Eastern','UTC')

Anyway, this whole thing really comes down to aggregating using the browser's timezone instead of UTC. Is this something Grafana should support? If so, I really think there should be a separate argument to the $__timeGroup macros to toggle this functionality. After all, there is probably a performance hit (however small) when aggregating using a different timezone.

Thanks for helping me talk through all of this. What do you think?

bminer avatar Aug 05 '22 20:08 bminer

Hey @bminer,

You've made some very interesting points. Indeed when the query is formatted as Time series things are a bit different and the workaround doesn't work. After looking a bit through this it seems that Grafana is resampling the rows resulted from SQL and changes the time values to better fit the time range based on the $__interval. This is done after the macro is applied so modifying the timeGroup macros will not solve the issue when formatted as Time series.

So the plot thickens beyond the macro issue discussed until now. I believe that the reason why we work with UTC values is exactly because of situations where things might become unpredictable, like the issue you raised about Daylight Savings Time.

I will think about this some more, but from what I gather it's quite a big topic to be discussed and it might require quite a big change to solve this which means there isn't really a quick fix that I can provide other than the table format workaround.

mdvictor avatar Aug 08 '22 12:08 mdvictor

Yeah, this is a little tricky, but honestly for most users, grouping by UTC time is going to be just fine (albeit a little confusing).

I think that the long-term solution is to convince the world to abandon time zones altogether. Everyone can use UTC, right? I don't mind waking up at 10 AM instead of 6 AM. Lunch at 4 PM seems reasonable. While we're at it, we can get rid of the penny, nickel, and dime. :wink:

bminer avatar Aug 08 '22 18:08 bminer

For those trying to read this issue and get the TLDR; version:

Problem:

  • With SQL data sources, Grafana groups using UTC time. This can be a problem when you want to group (i.e. by day) but align time windows according to the browser's time zone (or another arbitrary time zone).
  • Since Grafana defaults to displaying the UTC time in the browser's time zone, grouping using UTC time windows can be confusing to end users, especially when filtering based by time.

Discussion:

  • This is a tricky issue that is seemingly going to require more than a small change. Not only does the $__timeGroup macro need to be tweaked, but also the grouping behavior is different when in Table vs. Time Series mode.
  • I think we have an almost-working workaround (albeit a complicated one) for now. Use Table mode and write complicated SQL instead of using the $__timeGroup macro.
  • @mdvictor will be thinking about this more...
  • Timezones are annoying. Let's elect officials that will get rid of them.

bminer avatar Sep 14 '22 16:09 bminer

I am having the same issue using PostgreSQL with timestamps stored in UTC and the local time in UTC+2.

I would like to (and expect) that when I group something into buckets of, for example, 1 day this grouping would be relative to local time. i.e. if the local time is UTC+2 the bucket for a day would be 00:00 UTC+2 (or 22:00 UTC) to 23:59 UTC+2 (21:59 UTC).

The problem seems to be that the grouping is done on the database (and thus in whatever timezone the data in the table is in), the timestamps (which Grafana might all expect to be UTC) are then converted to the cllients timezone.

A potential solution would be to let the databse do the timezone conversion, so the grouping can be done based on the proper timezone, and then having the client/grafana not doing any conversion locally.

I wonder if there is a way to do this currently?

Zarickan avatar Sep 23 '22 21:09 Zarickan

@Zarickan - you have precisely described the issue, and there is currently no "good" way to do what you want.

The problem seems to be that the grouping is done on the database (and thus in whatever timezone the data in the table is in), the timestamps

My understanding is that buckets are always grouped using UTC, regardless of the database timezone.

A potential solution would be to let the databse do the timezone conversion, so the grouping can be done based on the proper timezone

This is precisely what I have suggested, but since many databases make this a chore, the implementation of this is ugly and complex. Anyway, a fairly complicated workaround is described above.

bminer avatar Oct 13 '22 19:10 bminer

@Zarickan - you have precisely described the issue, and there is currently no "good" way to do what you want.

The problem seems to be that the grouping is done on the database (and thus in whatever timezone the data in the table is in), the timestamps

My understanding is that buckets are always grouped using UTC, regardless of the database timezone.

A potential solution would be to let the databse do the timezone conversion, so the grouping can be done based on the proper timezone

This is precisely what I have suggested, but since many databases make this a chore, the implementation of this is ugly and complex. Anyway, a fairly complicated workaround is described above.

Interestingly working with timezones in postgres is really easy, assuming the timestamps are stored with a time zone, timestamptz for example. If this is the case you can set the timezone for the session and postgres will handle conversions between timezones. All the results you get back will be in the specified timezone and you can query stuff with times in the timezone you have specified. In my experience it makes the fact that things are stored in a different timezone completly transparent to the "user", they just work with the data as if it was actually stored in the timezone they had set for the session.

Zarickan avatar Oct 15 '22 21:10 Zarickan

Any updates on this?

maxpain avatar Mar 03 '23 21:03 maxpain

In my experience it makes the fact that things are stored in a different timezone completly transparent to the "user", they just work with the data as if it was actually stored in the timezone they had set for the session.

Is this also true for grouping @Zarickan?

bminer avatar Apr 03 '23 21:04 bminer

Here's another possible workaround if you want to hardcode the timezone to use for grouping:

CONVERT_TZ(FROM_UNIXTIME(
  UNIX_TIMESTAMP(CONVERT_TZ(created_at, 'UTC', 'US/Eastern')) DIV ($__interval_ms / 1000) * ($__interval_ms / 1000)
), 'US/Eastern', 'UTC') AS time,

It would be nice if Grafana exposed the timezone like 'US/Eastern' as a global variable much like $__from and $__interval_ms. The timezone string can be determined in modern browsers using Intl.DateTimeFormat().resolvedOptions().timeZone.

bminer avatar Apr 05 '23 02:04 bminer

This is an important issue, and we've had enough discussion around this to know this needs consideration. Our team will be talking more about this in the next few weeks.

baldm0mma avatar May 05 '23 17:05 baldm0mma

for reference, this is how the prometheus-plugin handles this:

  • from the browser the current timezone-offset is sent to the backend: https://github.com/grafana/grafana/blob/90e7791086b42902fcbc80a5dd1d1f74b8b0feb4/public/app/plugins/datasource/prometheus/datasource.ts#L363-L364
  • the backend applies it to it's calculations: https://github.com/grafana/grafana/blob/90e7791086b42902fcbc80a5dd1d1f74b8b0feb4/pkg/tsdb/prometheus/models/query.go#L178-L179

gabor avatar Feb 28 '24 14:02 gabor

Any update on this?

PeterTing avatar Apr 24 '24 03:04 PeterTing

Any updates?

shwuhk avatar May 16 '24 19:05 shwuhk

hi, timezones with databases is a complex problem, and i'm not sure if we even identified the solution that would universally solve everyone's problems, so, for now i'd recommend to write custom SQL that handles this (you can start with the "result" of the $__timeGroup macro, and adjust it).

you can get access to the current grafana timezone using the $__timezone variable starting with grafana10.1 (will not work in alerts, but will work in dashboards).

(NOTE: the $__timezone macro returns a string like UTC or Asia/Tokyo. i understand that for math it would be more useful to get the current offset as a number, but that can't be done reliably (your timerange's start-point may have a different offset than the end-point of your timerange)... still, you can use your database to do the math for you, like, the following postgres snippet will calculate the current offset in seconds:

extract(epoch from CURRENT_TIMESTAMP AT TIME ZONE 'UTC')
-
extract(epoch from CURRENT_TIMESTAMP AT TIME ZONE '$__timezone')

)

if you find sql queries that fullfill your needs, please make sure to post them here, so others can learn from them too.

gabor avatar May 20 '24 07:05 gabor

~Ran into this today running Grafana 11.2, here's my BigQuery specific solution:~

~SELECT timestamp(Datetime($__timeGroup(time, $time_bucket)),"$__timezone") AS time~

Edit: disregard, this only worked for a specific use case for wanting day size groups to land on midnight. It's not applicable to most cases here

HbirdJ avatar Oct 24 '24 23:10 HbirdJ

@bminer this works perfectly unless you use a interval variable, I don't know if grafana has a way to convert an interval var to ms.

But you can create this mysql func to convert the interval value to a ms value

DELIMITER $$

CREATE FUNCTION interval_ms(interval_value VARCHAR(20))
RETURNS BIGINT
DETERMINISTIC
BEGIN
    DECLARE result BIGINT;

    SET result = CASE
        WHEN interval_value LIKE BINARY '%m' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 60 * 1000  -- minutes to ms
        WHEN interval_value LIKE BINARY '%h' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 60 * 60 * 1000  -- hours to ms
        WHEN interval_value LIKE BINARY '%d' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 24 * 60 * 60 * 1000  -- days to ms
        WHEN interval_value LIKE BINARY '%w' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 7 * 24 * 60 * 60 * 1000  -- weeks to ms
        WHEN interval_value LIKE BINARY '%M' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 30 * 24 * 60 * 60 * 1000  -- months to ms (approximated as 30 days)
        WHEN interval_value LIKE BINARY '%y' THEN CAST(SUBSTRING(interval_value, 1, LENGTH(interval_value)-1) AS UNSIGNED) * 365 * 24 * 60 * 60 * 1000  -- years to ms (approximated as 365 days)
        ELSE 0  -- Default case for invalid intervals
    END;

    RETURN result;
END$$

DELIMITER ;

BrianLeishman avatar Feb 06 '25 14:02 BrianLeishman