Add support for GA4 pivot reports
Feature Description
As discussed on Slack, while we are initially implementing the Audience Tiles to use separate, per-audience reports to retrieve the cities and "top content" metrics (one report per audience per metric), we can reduce the number of reports needed by using pivot reports. This will allow us to retrieve metric data for all of the audiences in a single report (i.e., one report per metric).
We should add support for running pivot reports, and refactor the Audience Tiles to use them.
This is not critical to the release and can be done post-launch, hence the P2 priority.
Please also note this comment relating to the AudienceTile and AudienceTiles refactoring: https://github.com/google/site-kit-wp/issues/8484#issuecomment-2059400367
Because of the amount of work involved, this ticket has been split, only the GA4 pivot report infrastructure should be worked on in this ticket. Updating the Audience Tiles should be completed in #8726.
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
- REST and datastore APIs should be added to Site Kit to provide support for retrieving Analytics pivot reports.
- On the REST front, a new route:
GET /google-site-kit/v1/modules/analytics-4/data/pivot-report. - On the datastore side, a new selector/resolver pair:
getPivotReport().
- On the REST front, a new route:
- Update the mock data generator to generate valid responses to pivot reports.
Implementation Brief
Note that the implication of the second AC point is that each call to getReportForAllAudiences() should request a single pivot report rather than a report per audience. Please refer to the linked Slack thread for an example pivot report request structure.
Pivot Report Infrastructure
- [ ] Update
includes/Modules/Analytics_4/Report.php:- [ ] Import the class
Google\Site_Kit_Dependencies\Google\Service\AnalyticsData\PivotasGoogle_Service_AnalyticsData_Pivot. - [ ] Create a new protected method
parse_pivots, which takesData_Requestand returnsGoogle_Service_AnalyticsData_Pivot[], modelled on the parse_orderby method.- [ ] In the
array_map, create a newGoogle_Service_AnalyticsData_Pivotand usesetFieldNames,setLimitandsetOrderBysto build each pivot request value. - [ ] Before returning the generate pivots, check if
is_string( $data['compareStartDate'] ), and if so, create a newGoogle_Service_AnalyticsData_Pivotwith a fieldName ofdateRageand a limit of2, finally add aggregations usingsetMetricAggregations( array( 'TOTAL', 'MINIMUM', 'MAXIMUM' ) );so that we can use the TOTAL for each metric for each date range. This is required because, when you create a pivot report with multiple date ranges, the date ranges must be the final pivot column.
- [ ] In the
- [ ] Update the
parse_dimensionsmethod, adding thehostNamedimension using$dimensions[] = array('name' => 'hostName');, ifis_array($data['pivots'], after the$dimensionsvariable restructuring. This is needed because of the hostname dimension filter we add to all requests here to filter out domains other than the WordPress sites domain.
- [ ] Import the class
- [ ] Update the return statement in
includes/Modules/Analytics_4.php,GET:reportrequest, to call therunPivotReportmethod ifis_array( $data['pivots'] ), falling back to the existingrunReportcall. - [ ] Update
includes/Modules/Analytics_4/Report/Request.php:- [ ] Import the class
Google\Site_Kit_Dependencies\Google\Service\AnalyticsData\RunPivotReportRequestasGoogle_Service_AnalyticsData_RunPivotReportRequest. - [ ] In the
create_requestmethod:- [ ] Update the definition of
$requestto create a newGoogle_Service_AnalyticsData_RunPivotReportRequestifis_array( $data['pivots'] ), fall back toGoogle_Service_AnalyticsData_RunReportRequest. - [ ] Wrap the
setMetricAggregationscall with and if statement,! is_array( $data['pivots'] ), to prevent aggregations from being added at this level, as they must be added at the pivot level for a pivot report. - [ ] Before returning, get the pivots using
$pivots = $this->parse_pivots( $data );and add them to the request using:
if ( ! empty( $pivots ) ) { $request->setPivots( $pivots ); } - [ ] Update the definition of
- [ ] Import the class
- [ ] Update the code comments for the analytics
getReportsselector to show the pivots arguments is now supported.
Data Mock Updates
- [ ] Update
getAnalytics4MockResponseinassets/js/modules/analytics-4/utils/data-mock.jsto generate valid data when given a valid pivot report object. - [ ] Update the tests to cover new pivot report cases in
assets/js/modules/analytics-4/utils/data-mock.test.js
Test Coverage
- [ ] Update
tests/phpunit/integration/Modules/Analytics_4Test.phpadding comprehensive new tests of the pivot report functions including limits and ordering. Note: when making a pivot report you cannot pass limit or orderbys to the top level, they must be included for each pivot instead. - [ ] Confirm these changes to not cause and regressions in non pivot reports.
QA Brief
Changelog entry
As discussed here in 8136 there are a number of refactors required to the AudienceTiles and AudienceTile components which are best worked on here:
- Once the pivot reports are implemented, this data restructuring code should be removed and only the filtered pivot report rows for this audience segment should be passed, without restructuring, to the
AudienceTilecomponent.
https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTiles.js#L211-L271
-
The
AudienceTilecomponent props and internals should be updated to expect the raw report rows instead of the current data structure.
https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTile/index.js#L50-L63
-
The
AudienceTilestories can then be updated to use thedata-mockfunction instead of the hard coded props:
https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTile/index.js#L50-L63
For reference, here is an example pivot report and it's response based on my testing:
Report Options
const reportOptions = {
startDate: '2024-04-18',
endDate: '2024-05-15',
compareStartDate: '2024-03-21',
compareEndDate: '2024-04-17',
dimensions: [ { name: 'operatingSystem' } ],
dimensionFilters: {
operatingSystem: [ 'Windows', 'Macintosh' ],
},
metrics: [
{ name: 'totalUsers' },
{ name: 'sessionsPerUser' },
{ name: 'screenPageViewsPerSession' },
{ name: 'screenPageViews' },
],
pivots: [
{
fieldNames: [ 'operatingSystem' ],
limit: 3,
},
],
};
Report Response
{
"kind": "analyticsData#runPivotReport",
"pivotHeaders": [
{
"rowCount": 2,
"pivotDimensionHeaders": [
{
"dimensionValues": [
{
"value": "Windows"
}
]
},
{
"dimensionValues": [
{
"value": "Macintosh"
}
]
}
]
},
{
"rowCount": 2,
"pivotDimensionHeaders": [
{
"dimensionValues": [
{
"value": "date_range_0"
}
]
},
{
"dimensionValues": [
{
"value": "date_range_1"
}
]
}
]
}
],
"dimensionHeaders": [
{
"name": "operatingSystem"
},
{
"name": "dateRange"
}
],
"metricHeaders": [
{
"name": "totalUsers",
"type": "TYPE_INTEGER"
},
{
"name": "sessionsPerUser",
"type": "TYPE_FLOAT"
},
{
"name": "screenPageViewsPerSession",
"type": "TYPE_FLOAT"
},
{
"name": "screenPageViews",
"type": "TYPE_INTEGER"
}
],
"rows": [
{
"dimensionValues": [
{
"value": "Windows"
},
{
"value": "date_range_0"
}
],
"metricValues": [
{
"value": "102"
},
{
"value": "1.0392156862745099"
},
{
"value": "1.5"
},
{
"value": "159"
}
]
},
{
"dimensionValues": [
{
"value": "Macintosh"
},
{
"value": "date_range_1"
}
],
"metricValues": [
{
"value": "51"
},
{
"value": "1.0588235294117647"
},
{
"value": "1.6666666666666667"
},
{
"value": "90"
}
]
},
{
"dimensionValues": [
{
"value": "Windows"
},
{
"value": "date_range_1"
}
],
"metricValues": [
{
"value": "46"
},
{
"value": "1.0434782608695652"
},
{
"value": "1.75"
},
{
"value": "84"
}
]
},
{
"dimensionValues": [
{
"value": "Macintosh"
},
{
"value": "date_range_0"
}
],
"metricValues": [
{
"value": "41"
},
{
"value": "1.0487804878048781"
},
{
"value": "1.3953488372093024"
},
{
"value": "60"
}
]
}
],
"aggregates": [
{
"dimensionValues": [
{
"value": "RESERVED_TOTAL"
},
{
"value": "date_range_0"
}
],
"metricValues": [
{
"value": "143"
},
{
"value": "1.0419580419580419"
},
{
"value": "1.4697986577181208"
},
{
"value": "219"
}
]
},
{
"dimensionValues": [
{
"value": "RESERVED_MAX"
},
{
"value": "date_range_0"
}
],
"metricValues": [
{
"value": "102"
},
{
"value": "1.0487804878048781"
},
{
"value": "1.5"
},
{
"value": "159"
}
]
},
{
"dimensionValues": [
{
"value": "RESERVED_TOTAL"
},
{
"value": "date_range_1"
}
],
"metricValues": [
{
"value": "97"
},
{
"value": "1.0515463917525774"
},
{
"value": "1.7058823529411764"
},
{
"value": "174"
}
]
},
{
"dimensionValues": [
{
"value": "RESERVED_MAX"
},
{
"value": "date_range_1"
}
],
"metricValues": [
{
"value": "51"
},
{
"value": "1.0588235294117647"
},
{
"value": "1.75"
},
{
"value": "90"
}
]
},
{
"dimensionValues": [
{
"value": "RESERVED_MIN"
},
{
"value": "date_range_1"
}
],
"metricValues": [
{
"value": "46"
},
{
"value": "1.0434782608695652"
},
{
"value": "1.6666666666666667"
},
{
"value": "84"
}
]
},
{
"dimensionValues": [
{
"value": "RESERVED_MIN"
},
{
"value": "date_range_0"
}
],
"metricValues": [
{
"value": "41"
},
{
"value": "1.0392156862745099"
},
{
"value": "1.3953488372093024"
},
{
"value": "60"
}
]
}
],
"metadata": {
"currencyCode": "USD",
"dataLossFromOtherRow": null,
"emptyReason": null,
"subjectToThresholding": null,
"timeZone": "Etc/GMT"
}
}
I often substitute audienceResourceName with operatingSystem as I don't have two audience with data on my Analytics properties currently but the dimensions are interchangeable.
I also have an Insomnia playground export with some example queries which I can share with anyone who picks this up.
@techanvil I split this ticket into two, moving work specific to Audience Tiles to #8726 as there is a lot of work involved in both parts of this ticket.
Thanks @benbowler, that sounds sensible!
Regarding the IB, it's a good first take. However with the runReport and runPivotReport GA4 endpoints being two distinct entities, with similar yet crucially different payloads, I would rather see Site Kit provide a similar separation for handling pivot reports, providing a new GET:pivot-report datapoint and runPivotReport() selector/resolver.
The commonalities between the two can be shared either via existing helpers or extracting new ones where appropriate.
This approach should be cleaner and easier to maintain going forward, and we won't have to maintain a mental model of the differences between the two GA4 endpoints to make sense of a single SK endpoint.
I realise the AC was a bit open to interpretation, so I've updated it to be more explicit. Please can you iterate on the IB, taking this into consideration?
Thanks @techanvil, I did consider this split originally, however I thought it would lead to lots of duplicated code as the report structures are so similar, also our report infra already does not directly map to the GA Data API structure. That's not a reason not to improve things though.
I've updated the IB to have distinct pivot selector and REST endpoint and pivot-reports store. I've increased the estimate as there will be lots of additional changes and requirements for additional tests to complete this work.
Hey @benbowler! Indeed, I imagine it will result in a bit of duplicate boilerplate, but I'd hope we can avoid too much duplication of logic. Thanks for updating the IB, here are a few further points:
- The update is heading in the right direction, but I would still like to see more separation of the logic. Essentially this has pushed a chunk of the conditional parsing back to
Request::create_request(), which still has a mix ofrunReportandrunPivotReportlogic. Really what I would prefer is to avoid having to do any conditional logic based on the presence of$data['pivots'], and instead have sayRunReportRequestandRunPivotReportRequestclasses each of which has their owncreate_request()method (maintaining the single responsibility principle, one endpoint per class). Common logic could be extracted to a utility class (or a shared base class, although I'd personally prefer not to add too much depth to the class heirarchy if we can avoid it). I realise this is a bit more of a refactor but it should give a cleaner end result where each class only contains the logic relevant to it. - As per the above I'd also like to keep the common logic cleaner and avoid pivot-based conditionality. This applies to
parse_dimensions()too - I think we can avoid the conditional change you've proposed by instead appendinghostNameto the list of dimensions we pass toparse_dimensions()from the pivot-specificcreate_request()method. We might want to refactorparse_dimensions()a bit to facilitate this but I would rather keep the common method as simple as it can be, with the pivot-specific stuff in the caller. - It's commendable you've done the research on how to handle multiple date ranges, but it's not actually clear we're going to need to support them for pivot reports. The initial reports we'll use pivots for that are in
AudienceTilesdon't have multiple date ranges so I would suggest we leave that part out for now to keep things simpler and drop support for thecompare.*Dateargs ingetPivotReport().
It's commendable you've done the research on how to handle multiple date ranges, but it's not actually clear we're going to need to support them for pivot reports. The initial reports we'll use pivots for that are in AudienceTiles don't have multiple date ranges so I would suggest we leave that part out for now to keep things simpler and drop support for the compare.*Date args in getPivotReport().
@techanvil to this point, I think we do need the data range comparisons to be able to show the % change metrics in the Audience Tiles, we need to be able to get the aggregate totals from the previous period.
I'll work through your other key points ASAP, thanks so much for taking a detailed look at this and working through this with me.
Thanks @benbowler!
Just dropping a quick reply re. the date range comparisons. From what I can see, the reports we use getReportForAllAudiences() for, which are the ones we'll rewrite as pivot reports, don't use the compare.*Date args:
https://github.com/google/site-kit-wp/blob/1706414f4a1c8e4e1cd9aca2723a1adafdb47c7c/assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTiles.js#L109-L162
The one report that does use these args uses getReport() and should be fine to leave as it is:
https://github.com/google/site-kit-wp/blob/1706414f4a1c8e4e1cd9aca2723a1adafdb47c7c/assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTiles.js#L76-L90
So, from what I can see we should be able to skip the date range comparison related logic for pivot reports and add it if/when we do need it. Of course if I have missed something please do let me know!
Thanks @techanvil, yes this clears it up. I thought we were using the pivot reports for the other metrics but I can see that we're able to get these with a dimensionFilter instead.
Thanks @benbowler, this is looking good. Once last thing - it would be good to give a bit of direction to encourage sharing more logic between the two Request classes. Thinking about all the bits of create_request() it would be nice not to duplicate.
It doesn't need to go into too much detail, just the high level would be enough. Maybe we create a new SharedRequestCreators class or suchlike and say "extract common logic from the Request classes into this class". WDYT?
Sounds good, added that line.
Thanks Ben! IB LGTM :white_check_mark:
QA Brief run as part of the code review, and since it's very straightforward moving directly to Approval, since there isn't much to QA. 😄