Add “partial data” states infrastructure
Feature Description
Add the full infrastructure for determining and exposing the "partial data" states for audiences, custom dimensions and properties.
See partial data states in the design doc.
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
- Analytics module should have new selectors for detecting whether an audience, custom dimension or analytics property (referred as resource in the following points) is in "partial data" state.
- A resource is considered to be in "partial data" state until it has been active for the full duration of the currently selected date range.
- A resource is also considered to be in "partial data" state if the GA4 itself is in gathering data.
- Partial data state should be determined by retrieving a report of the given resource and checking the date of the earliest event and making a comparison with the start date of the current date range.
- Similarly how it's done for the gathering data states, the date of the earliest event, once determined, should be persisted on the server and made available in client on page load.
- The persisted date for a given resource. whenever available. should be used instead of making a report request to determine the partial states in the resolvers of partial data selectors.
- Persisted dates for all resources should be reset whenever Analytics property or measurement ID changes, Analytics module is deactivated or Site Kit is reset.
Implementation Brief
Note: the following IB is heavily based on and inspired by Data_Available state for modules and custom dimensions. Any gap in the IB may be filled in by reassessing the implementation and comparing with the aforementioned infrastructure.
PHP
- [ ] Create class
Google\Site_Kit\Modules\Analytics_4\Resource_Data_Availability_Date.- [ ] Take
Transients $transientsin the constructor and initialize as a field. - [ ] Use Const
VALID_CUSTOM_DIMENSION_SLUGSandVALID_AUDIENCE_SLUGSto store valid and allowed custom dimensions and slugs. - [ ] Have
RESOURCE_TYPE_**consts for audience, custom dimension and property resources. - [ ] Method
get_resource_transient_nametakes resource name and resource type parameters and returns the computed transient name. ie.return "googlesitekit_{$resource_type}_{$resource_name}_data_availability_date"; - [ ] Method
get_resource_datesshould return an associative array of the data availability date of resources. This can be multi dimensional array or the resources can be prefixed with the resource type. - [ ] Other methods
get_resource_date,set_resource_date,reset_resource_dateetc should be implemented similarly to how it is done onGoogle\Site_Kit\Modules\Analytics_4\Custom_Dimensions_Data_Availableclass.
- [ ] Take
- [ ] In
Google\Site_Kit\Modules\Analytics_4class:- [ ] Add
$resource_data_available_datefield and instantiate it withResource_Data_Availability_Datein the constructor. - [ ] Create A New REST Endpoint
POST:save-resource-data-availability-dateinAnalytics_4module.- [ ] It should check if the passed resource(s) in the $data (
audience,customDimensionorproperty) are valid, and then persist the date values as a timestamp in the DB using the$this->resource_data_available_date->set_resource_datemethod.
- [ ] It should check if the passed resource(s) in the $data (
- [ ] Expose the persisted dates of resource data availability to client using
googlesitekit_inline_modules_datafilter in theregistermethod. - [ ] Call
$resource_data_available_date->reset_resource_date()inon_deactivationmethod to reset all persisted dates on module deactivation. - [ ] Call
$resource_data_available_date->reset_resource_date()in the$this->get_settings()->on_change()when property ID or measurement ID is different, similarly to how it's done with$this->custom_dimensions_data_available->reset_data_available()to reset the persisted dates when analytics property/measurement ID changes.
- [ ] Add
JS
- [ ] Create
assets/js/modules/analytics-4/datastore/partial-data.jsfile.- [ ] Create a fetch store for the aforementioned POST API.
- [ ] Actions:
- [ ]
saveResourceDataAvailabilityDatetakes an array of the object{resource name, resource type and date}and save it to the server using the fetch store.
- [ ]
- [ ] Selectors:
- [ ]
getResourceDataAvailabilityDate(resourceName, resourceTyoe): returns the date associated with the given resource if available, otherwise resolves to the first date in the last 90 days that the report data became available using the associated resolver (described below). The 90 days is chosen because that's the longest date range available in Site Kit. - [ ]
is{audience|customDimension|Property}PartialData(resourceName):- [ ] Return
truewhen GA4 is in gathering Data state. - [ ] Return
falsewhen thedataAvailabilityDatefor the is same or earlier than thestartDateof currently selected date range. - [ ] Otherwise, return
true. This also handles the case wheredataAvailabilityDatefor a given resource can not be determined due to errors or being in the shared dashboard.
- [ ] Return
- [ ]
- [ ] Resolvers
- [ ]
getResourceDataAvailabilityDate:- [ ] Get
reportArgsfor the given resource. - [ ] For a property, this
ReportArgsis similar to one returned bygetSampleReportArgsfromassets/js/modules/analytics-4/utils/report-args.js, while the change here being:- [ ] Start date: creation date of the current GA property.
- [ ] End date: the reference date.
- [ ] For audience, the
reportArgswill includeaudienceResourceNameas an additionaldimension.- [ ] This will allow for a single report for all audience resources, and filtering the resulted report for a specific resource in JS to get the earliest date for a given audience resource.
- [ ] For Custom Dimension, report args should be the following:
- [ ] Start date: creation date of the current GA property.
- [ ] End date: the reference date.
- [ ] The dimension:
datefor property resource, andcustomEvent:${ resourceName } - [ ] Metric:
eventCount. - [ ] See
getDataAvailabilityReportOptionsselector inassets/js/modules/analytics-4/datastore/custom-dimensions-gathering-data.jsandgetSampleReportArgsinassets/js/modules/analytics-4/datastore/report.jsfor more complete example. The implementation can largely be followed.
- [ ] Make a simple report request to the given resource using the above report args.
- [ ] Find the date of the first available report.
- [ ] If there is any error or user doesn't have permission (ie. the property creation date can not be accessed in shared dashboard), return
nulland do not persist anything. - [ ]Otherwise, persist the date for the given resource using
saveResourceDataAvailabilityDateand return the date.
- [ ] Get
- [ ]
- [ ] Add the newly added store partial to
assets/js/modules/analytics-4/datastore/index.js.
Test Coverage
- Add PHP Unit test for the newly added infrastructure.
- Add Jest test for the newly added selectors and actions.
QA Brief
Changelog entry
AC ✔️
- Create
assets/js/modules/analytics-4/datastore/custom-dimensions-partial-data.jsfile.
I think the file should be renamed to be more generic, something like partial-data.js because custom-dimensions- prefix refers to the custom dimensions matter which is just one out of three matters of the task.
Add the full infrastructure for determining and exposing the "partial data" states for audiences, custom dimensions and properties.
The "determining" part is missing in IB. We need to add instructions how to detect and save partial data information for all three matters.
Thank you @eugene-manuilov for the review!
I think the file should be renamed to be more generic, something like partial-data.js because custom-dimensions- prefix refers to the custom dimensions matter which is just one out of three matters of the task.
Correct! I've updated the file name accordingly.
The "determining" part is missing in IB. We need to add instructions how to detect and save partial data information for all three matters.
The getResourceDataAvailabilityDate will either determine the first available date with data using a getReport request to the given resource with a 90-day report window (in resolver) or return the persisted date. We then use this date for the current date range in the is{audience|customDimension|Property}PartialData(resourceName) selectors to determine the partial data state. We can't persist the boolean value of this without needlessly complicating this, as this can be different based on the currently selected date range.
My thinking here is that something can be in partial data state for a 28-day range, but still can have all the data it needs for a 7-day range and thus not being in partial data. So by saving the first available date for a 90 day report instead, we can recompute the partial data state for all our supported date range.
Let me know what you think!
My thinking here is that something can be in partial data state for a 28-day range, but still can have all the data it needs for a 7-day range and thus not being in partial data. So by saving the first available date for a 90 day report instead, we can recompute the partial data state for all our supported date range.
Hey @kuasha420 @eugene-manuilov, just chipping in here as I had imagined we'd probably want to take the approach of requesting a report with a start date of the property creation time, that way we could get a definitive first-event-date and not keep requesting reports if say a property's events are all prior to the current 90 window. WDYT?
My thinking here is that something can be in partial data state for a 28-day range, but still can have all the data it needs for a 7-day range and thus not being in partial data. So by saving the first available date for a 90 day report instead, we can recompute the partial data state for all our supported date range.
Hey @kuasha420 @eugene-manuilov, just chipping in here as I had imagined we'd probably want to take the approach of requesting a report with a start date of the property creation time, that way we could get a definitive first-event-date and not keep requesting reports if say a property's events are all prior to the current 90 window. WDYT?
Thanks, @techanvil. I think this is a good idea. @kuasha420, could you please update your IB to use what Tom suggests? We also need to make sure that this information is reset when the user changes Analytics settings.
@eugene-manuilov Thank you for the review and @techanvil for the pointers. I've updated the IB accordingly and added an additional point in the AC about resetting the persisted dates (also reflected the addition in IB as well) based on the review and some internal discussion. Let me know what you think. Cheers.
Thanks, @kuasha420. Mostly looks good to me. Added a few pretty minor comments for you:
- Method
get_resource_data_availability_date_transient_nametakes resource name ...- Method
get_resource_data_availability_datesshould return ...- Other methods
get_resource_data_availability_date,set_resource_data_availability_date,reset_resource_data_availability_dateetc ...
There is no need to duplicate data_availability_date in methods names if only having it makes a big difference for the method. In other words, if we call the method as get_resource_transient_name it will remain the same meaning and will be more concise.
-
get_resource_data_availability_dates->get_resource_dates -
get_resource_data_availability_date->get_resource_date -
set_resource_data_availability_date->set_resource_date -
reset_resource_data_availability_date->reset_resource_date
... and resource type parameters and returns the computed transient name. ie.
return "googlesitekit_custom_dimension_{$resource_type}_{$resource_name}_data_availability_date";
I believe the _constom_dimension_ part is not needed and the template should be as googlesitekit_{$resource_type}_{$resource_name}_data_availability_date, right?
Thanks @eugene-manuilov ! Your suggested names are shorter and while a little ambiguous (what date?), makes sense in the broader context, because the methods will be called from the class instance (ie. $this->resource_data_available_date->set_resource_date) so the meaning can be inferred. I've updated the method names and their references accordingly.
I believe the
_constom_dimension_part is not needed and the template should be asgooglesitekit_{$resource_type}_{$resource_name}_data_availability_date, right?
Yep, that's correct. It was ~~skill issue~~ copy paste error on my part, fixed!
Cheers.
Thanks, @kuasha420. IB ✔️
QA Update ❌
Great work, @kuasha420. The functionalities work as expected except for an issue regarding removing the transient when disconnecting Analytics.
-
Verified: The test environment was set up successfully, and connections were established to two different Analytics properties:
- A property with existing data (
oi.ie), active for over 7 days. - A property recently created without data.
- A property with existing data (
-
isAudiencePartialDataSelector- Verified: Works as expected on both properties. It correctly identified partial data states based on audience data availability relative to the selected date range.
-
isCustomDimensionPartialDataSelector- Verified: Works correctly, showing partial data states when the data for the custom dimensions is insufficient.
-
isPropertyPartialDataSelector- Verified: Accurately reflects the partial data state when the GA4 is still in the gathering data state.
-
getResourceDataAvailabilityDateSelector-
Verified: The selector successfully retrieves the earliest event dates, and the data is persisted through the
POST:save-resource-data-availability-dateendpoint to WordPress Transients. - Verified: The persistence of data availability dates is independent of the partial data state.
-
Verified: The selector successfully retrieves the earliest event dates, and the data is persisted through the
-
Resetting Behavior
- Verified: All related transients are reset upon Site Kit reset.
-
Verified: The following transients related to data availability dates are correctly reset when changing the account or property or measurement ID.
-
_transient_googlesitekit_audience_**_data_availability_date -
_transient_googlesitekit_customDimension_**_data_availability_date -
_transient_googlesitekit_property_**_data_availability
-
-
Issue Found:
_transient_googlesitekit_audience_**_data_availability_dateis not being deleted upon disconnecting the Analytics module. However, the other two transients are correctly removed. ❌
Excellent catch, thank you @hussain-t! The follow-up PR has been merged and this is now back with you for another QA:Eng round.
QA Verified ✅
Issue Found: transient_googlesitekit_audience**_data_availability_date is not being deleted upon disconnecting the Analytics module. However, the other two transients are correctly removed. ❌
-
Verified:
_transient_googlesitekit_audience_**_data_availability_datetransient and other transients are removed upon disconnecting the Analytics module. ✅