site-kit-wp Add support for GA4 pivot reports

Feature Description

As discussed on Slack, while we are initially implementing the Audience Tiles to use separate, per-audience reports to retrieve the cities and "top content" metrics (one report per audience per metric), we can reduce the number of reports needed by using pivot reports. This will allow us to retrieve metric data for all of the audiences in a single report (i.e., one report per metric).

We should add support for running pivot reports, and refactor the Audience Tiles to use them.

This is not critical to the release and can be done post-launch, hence the P2 priority.

Please also note this comment relating to the AudienceTile and AudienceTiles refactoring: https://github.com/google/site-kit-wp/issues/8484#issuecomment-2059400367

Because of the amount of work involved, this ticket has been split, only the GA4 pivot report infrastructure should be worked on in this ticket. Updating the Audience Tiles should be completed in #8726.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

REST and datastore APIs should be added to Site Kit to provide support for retrieving Analytics pivot reports.
- On the REST front, a new route: GET /google-site-kit/v1/modules/analytics-4/data/pivot-report.
- On the datastore side, a new selector/resolver pair: getPivotReport().
Update the mock data generator to generate valid responses to pivot reports.

Implementation Brief

Note that the implication of the second AC point is that each call to getReportForAllAudiences() should request a single pivot report rather than a report per audience. Please refer to the linked Slack thread for an example pivot report request structure.

Pivot Report Infrastructure

[ ] Update includes/Modules/Analytics_4/Report.php:
- [ ] Import the class Google\Site_Kit_Dependencies\Google\Service\AnalyticsData\Pivot as Google_Service_AnalyticsData_Pivot.
- [ ] Create a new protected method parse_pivots, which takes Data_Request and returns Google_Service_AnalyticsData_Pivot[], modelled on the parse_orderby method.
  - [ ] In the array_map, create a new Google_Service_AnalyticsData_Pivot and use setFieldNames, setLimit and setOrderBys to build each pivot request value.
  - [ ] Before returning the generate pivots, check if is_string( $data['compareStartDate'] ), and if so, create a new Google_Service_AnalyticsData_Pivot with a fieldName of dateRage and a limit of 2, finally add aggregations using setMetricAggregations( array( 'TOTAL', 'MINIMUM', 'MAXIMUM' ) ); so that we can use the TOTAL for each metric for each date range. This is required because, when you create a pivot report with multiple date ranges, the date ranges must be the final pivot column.
- [ ] Update the parse_dimensions method, adding the hostName dimension using $dimensions[] = array('name' => 'hostName');, if is_array($data['pivots'], after the $dimensions variable restructuring. This is needed because of the hostname dimension filter we add to all requests here to filter out domains other than the WordPress sites domain.
[ ] Update the return statement in includes/Modules/Analytics_4.php, GET:report request, to call the runPivotReport method if is_array( $data['pivots'] ), falling back to the existing runReport call.
[ ] Update includes/Modules/Analytics_4/Report/Request.php:
- [ ] Import the class Google\Site_Kit_Dependencies\Google\Service\AnalyticsData\RunPivotReportRequest as Google_Service_AnalyticsData_RunPivotReportRequest.
- [ ] In the create_request method:
  - [ ] Update the definition of $request to create a new Google_Service_AnalyticsData_RunPivotReportRequest if is_array( $data['pivots'] ), fall back to Google_Service_AnalyticsData_RunReportRequest.
  - [ ] Wrap the setMetricAggregations call with and if statement, ! is_array( $data['pivots'] ), to prevent aggregations from being added at this level, as they must be added at the pivot level for a pivot report.
  - [ ] Before returning, get the pivots using $pivots = $this->parse_pivots( $data ); and add them to the request using:
```
if ( ! empty( $pivots ) ) {
	$request->setPivots( $pivots );
}
```
[ ] Update the code comments for the analytics getReports selector to show the pivots arguments is now supported.

Data Mock Updates

[ ] Update getAnalytics4MockResponse in assets/js/modules/analytics-4/utils/data-mock.js to generate valid data when given a valid pivot report object.
[ ] Update the tests to cover new pivot report cases in assets/js/modules/analytics-4/utils/data-mock.test.js

Test Coverage

[ ] Update tests/phpunit/integration/Modules/Analytics_4Test.php adding comprehensive new tests of the pivot report functions including limits and ordering. Note: when making a pivot report you cannot pass limit or orderbys to the top level, they must be included for each pivot instead.
[ ] Confirm these changes to not cause and regressions in non pivot reports.

QA Brief

Changelog entry

Apr 05 '24 10:04 techanvil

As discussed here in 8136 there are a number of refactors required to the AudienceTiles and AudienceTile components which are best worked on here:

Once the pivot reports are implemented, this data restructuring code should be removed and only the filtered pivot report rows for this audience segment should be passed, without restructuring, to the AudienceTile component.

https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTiles.js#L211-L271

The AudienceTile component props and internals should be updated to expect the raw report rows instead of the current data structure.

https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTile/index.js#L50-L63

The AudienceTile stories can then be updated to use the data-mock function instead of the hard coded props:

https://github.com/google/site-kit-wp/blob/5dcc66aa50bf8847936476c792e8df28cfd49993/assets/js/modules/analytics-4/components/dashboard/AudienceSegmentation/AudienceTile/index.js#L50-L63

Apr 16 '24 15:04 benbowler

For reference, here is an example pivot report and it's response based on my testing:

Report Options

	const reportOptions = {
		startDate: '2024-04-18',
		endDate: '2024-05-15',
		compareStartDate: '2024-03-21',
		compareEndDate: '2024-04-17',
		dimensions: [ { name: 'operatingSystem' } ],
		dimensionFilters: {
			operatingSystem: [ 'Windows', 'Macintosh' ],
		},
		metrics: [
			{ name: 'totalUsers' },
			{ name: 'sessionsPerUser' },
			{ name: 'screenPageViewsPerSession' },
			{ name: 'screenPageViews' },
		],
		pivots: [
			{
				fieldNames: [ 'operatingSystem' ],
				limit: 3,
			},
		],
	};

Report Response

{
  "kind": "analyticsData#runPivotReport",
  "pivotHeaders": [
    {
      "rowCount": 2,
      "pivotDimensionHeaders": [
        {
          "dimensionValues": [
            {
              "value": "Windows"
            }
          ]
        },
        {
          "dimensionValues": [
            {
              "value": "Macintosh"
            }
          ]
        }
      ]
    },
    {
      "rowCount": 2,
      "pivotDimensionHeaders": [
        {
          "dimensionValues": [
            {
              "value": "date_range_0"
            }
          ]
        },
        {
          "dimensionValues": [
            {
              "value": "date_range_1"
            }
          ]
        }
      ]
    }
  ],
  "dimensionHeaders": [
    {
      "name": "operatingSystem"
    },
    {
      "name": "dateRange"
    }
  ],
  "metricHeaders": [
    {
      "name": "totalUsers",
      "type": "TYPE_INTEGER"
    },
    {
      "name": "sessionsPerUser",
      "type": "TYPE_FLOAT"
    },
    {
      "name": "screenPageViewsPerSession",
      "type": "TYPE_FLOAT"
    },
    {
      "name": "screenPageViews",
      "type": "TYPE_INTEGER"
    }
  ],
  "rows": [
    {
      "dimensionValues": [
        {
          "value": "Windows"
        },
        {
          "value": "date_range_0"
        }
      ],
      "metricValues": [
        {
          "value": "102"
        },
        {
          "value": "1.0392156862745099"
        },
        {
          "value": "1.5"
        },
        {
          "value": "159"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "Macintosh"
        },
        {
          "value": "date_range_1"
        }
      ],
      "metricValues": [
        {
          "value": "51"
        },
        {
          "value": "1.0588235294117647"
        },
        {
          "value": "1.6666666666666667"
        },
        {
          "value": "90"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "Windows"
        },
        {
          "value": "date_range_1"
        }
      ],
      "metricValues": [
        {
          "value": "46"
        },
        {
          "value": "1.0434782608695652"
        },
        {
          "value": "1.75"
        },
        {
          "value": "84"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "Macintosh"
        },
        {
          "value": "date_range_0"
        }
      ],
      "metricValues": [
        {
          "value": "41"
        },
        {
          "value": "1.0487804878048781"
        },
        {
          "value": "1.3953488372093024"
        },
        {
          "value": "60"
        }
      ]
    }
  ],
  "aggregates": [
    {
      "dimensionValues": [
        {
          "value": "RESERVED_TOTAL"
        },
        {
          "value": "date_range_0"
        }
      ],
      "metricValues": [
        {
          "value": "143"
        },
        {
          "value": "1.0419580419580419"
        },
        {
          "value": "1.4697986577181208"
        },
        {
          "value": "219"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "RESERVED_MAX"
        },
        {
          "value": "date_range_0"
        }
      ],
      "metricValues": [
        {
          "value": "102"
        },
        {
          "value": "1.0487804878048781"
        },
        {
          "value": "1.5"
        },
        {
          "value": "159"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "RESERVED_TOTAL"
        },
        {
          "value": "date_range_1"
        }
      ],
      "metricValues": [
        {
          "value": "97"
        },
        {
          "value": "1.0515463917525774"
        },
        {
          "value": "1.7058823529411764"
        },
        {
          "value": "174"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "RESERVED_MAX"
        },
        {
          "value": "date_range_1"
        }
      ],
      "metricValues": [
        {
          "value": "51"
        },
        {
          "value": "1.0588235294117647"
        },
        {
          "value": "1.75"
        },
        {
          "value": "90"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "RESERVED_MIN"
        },
        {
          "value": "date_range_1"
        }
      ],
      "metricValues": [
        {
          "value": "46"
        },
        {
          "value": "1.0434782608695652"
        },
        {
          "value": "1.6666666666666667"
        },
        {
          "value": "84"
        }
      ]
    },
    {
      "dimensionValues": [
        {
          "value": "RESERVED_MIN"
        },
        {
          "value": "date_range_0"
        }
      ],
      "metricValues": [
        {
          "value": "41"
        },
        {
          "value": "1.0392156862745099"
        },
        {
          "value": "1.3953488372093024"
        },
        {
          "value": "60"
        }
      ]
    }
  ],
  "metadata": {
    "currencyCode": "USD",
    "dataLossFromOtherRow": null,
    "emptyReason": null,
    "subjectToThresholding": null,
    "timeZone": "Etc/GMT"
  }
}

I often substitute audienceResourceName with operatingSystem as I don't have two audience with data on my Analytics properties currently but the dimensions are interchangeable.

I also have an Insomnia playground export with some example queries which I can share with anyone who picks this up.

May 16 '24 15:05 benbowler

@techanvil I split this ticket into two, moving work specific to Audience Tiles to #8726 as there is a lot of work involved in both parts of this ticket.

May 17 '24 14:05 benbowler

Thanks @benbowler, that sounds sensible!

Regarding the IB, it's a good first take. However with the runReport and runPivotReport GA4 endpoints being two distinct entities, with similar yet crucially different payloads, I would rather see Site Kit provide a similar separation for handling pivot reports, providing a new GET:pivot-report datapoint and runPivotReport() selector/resolver.

The commonalities between the two can be shared either via existing helpers or extracting new ones where appropriate.

This approach should be cleaner and easier to maintain going forward, and we won't have to maintain a mental model of the differences between the two GA4 endpoints to make sense of a single SK endpoint.

I realise the AC was a bit open to interpretation, so I've updated it to be more explicit. Please can you iterate on the IB, taking this into consideration?

May 17 '24 15:05 techanvil

Thanks @techanvil, I did consider this split originally, however I thought it would lead to lots of duplicated code as the report structures are so similar, also our report infra already does not directly map to the GA Data API structure. That's not a reason not to improve things though.

I've updated the IB to have distinct pivot selector and REST endpoint and pivot-reports store. I've increased the estimate as there will be lots of additional changes and requirements for additional tests to complete this work.

May 22 '24 13:05 benbowler

Hey @benbowler! Indeed, I imagine it will result in a bit of duplicate boilerplate, but I'd hope we can avoid too much duplication of logic. Thanks for updating the IB, here are a few further points:

The update is heading in the right direction, but I would still like to see more separation of the logic. Essentially this has pushed a chunk of the conditional parsing back to Request::create_request(), which still has a mix of runReport and runPivotReport logic. Really what I would prefer is to avoid having to do any conditional logic based on the presence of $data['pivots'], and instead have say RunReportRequest and RunPivotReportRequest classes each of which has their own create_request() method (maintaining the single responsibility principle, one endpoint per class). Common logic could be extracted to a utility class (or a shared base class, although I'd personally prefer not to add too much depth to the class heirarchy if we can avoid it). I realise this is a bit more of a refactor but it should give a cleaner end result where each class only contains the logic relevant to it.
As per the above I'd also like to keep the common logic cleaner and avoid pivot-based conditionality. This applies to parse_dimensions() too - I think we can avoid the conditional change you've proposed by instead appending hostName to the list of dimensions we pass to parse_dimensions() from the pivot-specific create_request() method. We might want to refactor parse_dimensions() a bit to facilitate this but I would rather keep the common method as simple as it can be, with the pivot-specific stuff in the caller.
It's commendable you've done the research on how to handle multiple date ranges, but it's not actually clear we're going to need to support them for pivot reports. The initial reports we'll use pivots for that are in AudienceTiles don't have multiple date ranges so I would suggest we leave that part out for now to keep things simpler and drop support for the compare.*Date args in getPivotReport().

May 22 '24 17:05 techanvil

It's commendable you've done the research on how to handle multiple date ranges, but it's not actually clear we're going to need to support them for pivot reports. The initial reports we'll use pivots for that are in AudienceTiles don't have multiple date ranges so I would suggest we leave that part out for now to keep things simpler and drop support for the compare.*Date args in getPivotReport().

@techanvil to this point, I think we do need the data range comparisons to be able to show the % change metrics in the Audience Tiles, we need to be able to get the aggregate totals from the previous period.

I'll work through your other key points ASAP, thanks so much for taking a detailed look at this and working through this with me.

May 23 '24 11:05 benbowler

Thanks @benbowler!

Just dropping a quick reply re. the date range comparisons. From what I can see, the reports we use getReportForAllAudiences() for, which are the ones we'll rewrite as pivot reports, don't use the compare.*Date args:

https://github.com/google/site-kit-wp/blob/1706414f4a1c8e4e1cd9aca2723a1adafdb47c7c/assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTiles.js#L109-L162

The one report that does use these args uses getReport() and should be fine to leave as it is:

https://github.com/google/site-kit-wp/blob/1706414f4a1c8e4e1cd9aca2723a1adafdb47c7c/assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTiles.js#L76-L90

So, from what I can see we should be able to skip the date range comparison related logic for pivot reports and add it if/when we do need it. Of course if I have missed something please do let me know!

May 23 '24 11:05 techanvil

Thanks @techanvil, yes this clears it up. I thought we were using the pivot reports for the other metrics but I can see that we're able to get these with a dimensionFilter instead.

May 23 '24 12:05 benbowler

Thanks @benbowler, this is looking good. Once last thing - it would be good to give a bit of direction to encourage sharing more logic between the two Request classes. Thinking about all the bits of create_request() it would be nice not to duplicate.

It doesn't need to go into too much detail, just the high level would be enough. Maybe we create a new SharedRequestCreators class or suchlike and say "extract common logic from the Request classes into this class". WDYT?

May 30 '24 12:05 techanvil

Sounds good, added that line.

May 30 '24 13:05 benbowler

Thanks Ben! IB LGTM :white_check_mark:

May 30 '24 16:05 techanvil

QA Brief run as part of the code review, and since it's very straightforward moving directly to Approval, since there isn't much to QA. 😄

Jun 20 '24 00:06 tofumatt