cube icon indicating copy to clipboard operation
cube copied to clipboard

Format all timestamps in query timezone

Open ricmatsui opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Results with timestamps are inconsistently converted to the query timezone depending on if it is used as a dimension, a time dimension, or a time dimension with granularity.

This makes presenting time data error-prone because it's unclear what timezone the data is formatted in since none of the results contain offsets.

Describe the solution you'd like

Always format timestamps in the query timezone to reduce timezone pitfalls.

Describe alternatives you've considered

  1. Another option would be to include the offset (or Z) when the time differs from the query timezone, making it unambiguous.
  2. Using SQL_UTILS.convertTz is a workaround, but requires defining additional dimensions and introduces pitfalls such as accidentally applying the offset twice in time dimensions:

Dimensions that use SQL_UTILS.convertTz() should not be used as timeDimensions in queries. Doing so will apply the conversion multiple times and yield wrong results.

Additional context

Example Schema for BigQuery:

cube('Test', {
    sql: `
        SELECT TIMESTAMP('2021-06-15T00:00:00', 'America/Los_Angeles') AS Created,
        TIMESTAMP('2021-06-15T00:00:00', 'America/Los_Angeles') AS Updated
    `,

    dimensions: {
        created: {
            sql: 'Created',
            type: 'time',
        },

        updated: {
            sql: 'Updated',
            type: 'time',
        },
    },
});

Query:

{
  "timeDimensions": [
    {
      "dimension": "Test.created",
      "dateRange": [
        "2021-06-15",
        "2021-06-15"
      ],
      "granularity": "second"
    }
  ],
  "dimensions": [
    "Test.updated"
  ],
  "timezone": "America/Los_Angeles"
}

Results:

[
  {
    // These are in the query timezone
    "Test.created.second": "2021-06-15T00:00:00.000",
    "Test.created": "2021-06-15T00:00:00.000"

    // This one is not in the query timezone
    // and it's ambiguous which timezone it is in.
    // Expected: "2021-06-15T00:00:00.000" or "2021-06-15T07:00:00.000Z"
    "Test.updated": "2021-06-15T07:00:00.000",
  }
]

Implementation:

A potential path could be to convert in the API Gateway as part of transformValue, but a better implementation could be in the drivers.

diff --git a/packages/cubejs-api-gateway/src/gateway.ts b/packages/cubejs-api-gateway/src/gateway.ts
index c4a00ab6..17901314 100644
--- a/packages/cubejs-api-gateway/src/gateway.ts
+++ b/packages/cubejs-api-gateway/src/gateway.ts
@@ -96,9 +96,9 @@ const prepareAnnotation = (metaConfig: MetaConfig[], query: any) => {
   };
 };
 
-const transformValue = (value, type) => {
+const transformValue = (value, type, timezone) => {
   if (value && (type === 'time' || value instanceof Date)) { // TODO support for max time
-    return (value instanceof Date ? moment(value) : moment.utc(value)).format(moment.HTML5_FMT.DATETIME_LOCAL_MS);
+    return (value instanceof Date ? moment(value) : moment.tz(value, timezone)).format(moment.HTML5_FMT.DATETIME_LOCAL_MS);
   }
   return value && value.value ? value.value : value; // TODO move to sql adapter
 };
@@ -117,7 +117,7 @@ const transformData = (aliasToMemberNameMap, annotation, data, query, queryType)
 
       const transformResult = [
         memberName,
-        transformValue(p[1], annotationForMember.type)
+        transformValue(p[1], annotationForMember.type, query.timezone)
       ];
 
       const path = memberName.split('.');

Related:

  • https://github.com/cube-js/cube.js/issues/443
  • https://github.com/cube-js/cube.js/issues/689
  • https://cube-js.slack.com/archives/CC0403RRR/p1612912462478500?thread_ts=1612839814.416800&cid=CC0403RRR

ricmatsui avatar Jun 16 '21 06:06 ricmatsui

Just wanted to second this issue. Any date dimensions should be converted to the given timezone. For instance, I have a query where I pull orders by the order date but also display the date the customer was created at. The order date converts to the correct timezone but the customer create at date is still in UTC (the timezone it's stored as in the database)

kmclaugh avatar Oct 20 '21 16:10 kmclaugh

We're also seeing this issue.

inventionlabsSydney avatar Sep 06 '22 08:09 inventionlabsSydney

this is still not resolved with latest version

stlsw avatar Dec 22 '22 06:12 stlsw

@paveltiunov, @hassankhan Any update to this issue? it would really helped out

Problem

I am applying "timezone": "Europe/Prague" in my query.

In myTable there is duration of some state in one hour

INSERT INTO MyTable (id, timestamp_column, duration_column)
VALUES
(1, '2021-11-30T00:00:00Z',3600),
(2, '2021-11-30T01:00:00Z', 3600),
(3, '2021-11-30T02:00:00Z', 3600),
...
(24, '2021-12-01T00:00:00Z', 3600),
...
(48, '2021-12-02T00:00:00Z', 3600),
...
(72, '2021-12-03T00:00:00Z', 3600),
...
(96, '2021-12-04T00:00:00Z', 3600),
...
(120, '2021-12-05T00:00:00Z', 3600);

And I need to get aggregated values with day granularity Cube.js Schema

Here is a simplified version of my Cube.js schema for reference:

cube(`Data`, {
  sql: `SELECT * FROM MyTable`,
  dimensions: {
    from: {
      sql: `${CUBE}.timestamp_column`,
      type: `time`
    },
  },
  measures: {
    totalTime: {
      type: `sum`,
      sql: `${CUBE}.duration_column`
    },
  }
});

Here's an example of a query I'm running:

{
  "query": {
    "measures": ["Data.totalTime"],
    "timeDimensions": [{
      "dimension": "Data.from",
      "granularity": "day",
      "dateRange": ["2021-12-01","2021-12-04"]
    }],
    "segments": ["Data.canSegment"],
    "timezone": "Europe/Prague"
  }
}

Expected behavior

I expect the UTC to be shifted to "Europe/Prague" TZ and data are aggregated according to this TZ.

data_day_from (in EuropeTZ)     totalTime_inSeconds
2021-12-01 00:00:00.000         86400
2021-12-02 00:00:00.000         86400
2021-12-03 00:00:00.000         86400
2021-12-04 00:00:00.000         86400

Actual behavior

The query returns the following results:

data_day_from (in UTC)        totalTime_inSeconds
2021-11-30 00:00:00.000         3600
2021-12-01 00:00:00.000         86400
2021-12-02 00:00:00.000         86400
2021-12-03 00:00:00.000         86400
2021-12-04 00:00:00.000         82800

The data_day_from values are not adjusted to the Europe/Prague timezone as expected, and the totalTime_inSeconds values are not consistently 86400 as they should be. There are leftovers on each side, so I guess I have no way calculate day aggregation in EuropeTZ

Thanks for the reply

prochj51 avatar Jun 23 '23 06:06 prochj51

I also tried to use

  • CUBEJS_SCHEDULED_REFRESH_TIMEZONES=Europe/Prague in docker-compose.yml
version: '2.2'

services:
  cube:
    image: cubejs/cube:v0.32.14
    ports:
      - 4001:4000
      - 15432:15432
    environment:
      - CUBEJS_DEV_MODE=true
      - CUBEJS_SCHEDULED_REFRESH_TIMEZONES=Europe/Prague
    volumes:
      - .:/cube/conf

and added preaggregation

preAggregations: {
    main: {
      measures: [Data.totalTime],
      dimensions: [DimAttribute.AssetId],
      timeDimension: Data.from,
      granularity: `hour`,
    }

but no effect whatsoever

I am not sure if I am doing this properly, the docs are not clear to me in this matter @igorlukanin :)

prochj51 avatar Jun 30 '23 10:06 prochj51

@prochj51 I'm having same issue just as you describe, here an example of query

{"measures":["myCube.lostOpportunities"],"timeDimensions":[{"dimension":"myCube.created","granularity":"hour","dateRange":["2023-07-04 00:00:00.000Z","2023-07-04 23:59:59.000Z"]}],"filters":[],"timezone":"Europe/Madrid"}

Response example:

{ "queryType": "regularQuery", "results": [ { "query": { "measures": [ "myCube.lostOpportunities" ], "timeDimensions": [ { "dimension": "myCube.created", "granularity": "hour", "dateRange": [ "2023-07-04T00:00:00.000", "2023-07-04T23:59:59.000" ] } ], "filters": [], "timezone": "Europe/Madrid", "limit": 10000, "order": [], "dimensions": [], "rowLimit": 10000, "queryType": "regularQuery" }, "data": [ { "myCube.created.hour": "2023-07-03T22:00:00.000", "myCube.created": "2023-07-03T22:00:00.000", "myCube.lostOpportunities": 0 }, { "myCube.created.hour": "2023-07-04T09:00:00.000", "myCube.created": "2023-07-04T09:00:00.000", "myCube.lostOpportunities": 2 }, { "myCube.created.hour": "2023-07-04T10:00:00.000", "myCube.created": "2023-07-04T10:00:00.000", "myCube.lostOpportunities": 4 }, { "myCube.created.hour": "2023-07-04T11:00:00.000", "myCube.created": "2023-07-04T11:00:00.000", "myCube.lostOpportunities": 1 }, { "myCube.created.hour": "2023-07-04T12:00:00.000", "myCube.created": "2023-07-04T12:00:00.000", "myCube.lostOpportunities": 1 }, { "myCube.created.hour": "2023-07-04T13:00:00.000", "myCube.created": "2023-07-04T13:00:00.000", "myCube.lostOpportunities": 0 }, { "myCube.created.hour": "2023-07-04T15:00:00.000", "myCube.created": "2023-07-04T15:00:00.000", "myCube.lostOpportunities": 1 }, { "myCube.created.hour": "2023-07-04T17:00:00.000", "myCube.created": "2023-07-04T17:00:00.000", "myCube.lostOpportunities": 0 }, { "myCube.created.hour": "2023-07-04T18:00:00.000", "myCube.created": "2023-07-04T18:00:00.000", "myCube.lostOpportunities": 0 }, { "myCube.created.hour": "2023-07-04T19:00:00.000", "myCube.created": "2023-07-04T19:00:00.000", "myCube.lostOpportunities": 0 } ], "lastRefreshTime": "2023-07-05T16:06:23.735Z", "transformedQuery": { "sortedDimensions": [], "sortedTimeDimensions": [ [ "myCube.created", "second" ] ], "timeDimensions": [ [ "myCube.created", "hour" ] ], "measures": [ "myCube.lostOpportunities" ], "leafMeasureAdditive": true, "leafMeasures": [ "myCube.lostOpportunities" ], "measureToLeafMeasures": { "myCube.lostOpportunities": [ { "measure": "myCube.lostOpportunities", "additive": true, "type": "count" } ] }, "hasNoTimeDimensionsWithoutGranularity": true, "allFiltersWithinSelectedDimensions": true, "isAdditive": true, "granularityHierarchies": { "year": [ "year", "quarter", "month", "month", "day", "hour", "minute", "second" ], "quarter": [ "quarter", "month", "day", "hour", "minute", "second" ], "month": [ "month", "day", "hour", "minute", "second" ], "week": [ "week", "day", "hour", "minute", "second" ], "day": [ "day", "hour", "minute", "second" ], "hour": [ "hour", "minute", "second" ], "minute": [ "minute", "second" ], "second": [ "second" ] }, "hasMultipliedMeasures": false, "hasCumulativeMeasures": false, "windowGranularity": null, "filterDimensionsSingleValueEqual": {}, "ownedDimensions": [], "ownedTimeDimensionsWithRollupGranularity": [ [ "myCube.created", "second" ] ], "ownedTimeDimensionsAsIs": [ [ "myCube.created", "hour" ] ] }, "requestId": "157d76af-f716-4fc2-8357-30fde05061c6-span-1", "annotation": { "measures": { "myCube.lostOpportunities": { "title": "My cube Lost Opportunities", "shortTitle": "Lost Opportunities", "type": "number", "drillMembers": [], "drillMembersGrouped": { "measures": [], "dimensions": [] } } }, "dimensions": {}, "segments": {}, "timeDimensions": { "myCube.created.hour": { "title": "my Cube Created", "shortTitle": "Created", "type": "time" }, "myCube.created": { "title": "my Cube Created", "shortTitle": "Created", "type": "time" } } }, "slowQuery": true, "total": null } ], "pivotQuery": { "measures": [ "myCube.lostOpportunities" ], "timeDimensions": [ { "dimension": "myCube.created", "granularity": "hour", "dateRange": [ "2023-07-04T00:00:00.000", "2023-07-04T23:59:59.000" ] } ], "filters": [], "timezone": "Europe/Madrid", "limit": 10000, "order": [], "dimensions": [], "rowLimit": 10000, "queryType": "regularQuery" }, "slowQuery": false }

My bigger problem is not when the granularity is hour because I can manage it in front but for day granularity the result count entries of a day in previous day(depending of timezone) and there is no way of manage in front.

cjfigueiras avatar Jul 06 '23 09:07 cjfigueiras

For those struggling with this, make sure you're setting process.env.TZ to be "Etc/UTC", per https://github.com/cube-js/cube/issues/4171#issuecomment-1079068709.

peterklingelhofer avatar Dec 05 '23 23:12 peterklingelhofer