GA not recording `org` and `publisher` correctly
After implementing CSS Selectors to collect organization and publisher (when present), GA is showing that the implementation worked, but it is recording those variables for some pageviews and not all.
How to reproduce
- Log into datagovGA4
- Create a report to show datagov_dataset_organization
- Filter by a specific dataset URL
- witness the variance in organization and (not set)
Expected behavior
100% of dataset pageviews will attribute a datagov_dataset_organization and a datagov_dataset_publisher. Each dataset page will contain one value per variable, and not a value and also (not set) for some pageviews.
Actual behavior
~60% of pageviews record an org and publisher, and ~40% record (not set)
Sketch
- [ ] Research activity to determine issue or if it is related to bots
- [ ] Make necessary changes
- [ ] Validate changes work in GA
New goal: Have the organization and publisher drawn from CKAN directly to populate a dataLayer array upon pageload, like usa.gov does:
This would apply on all pages related to a dataset, for example:
- https://catalog.data.gov/dataset/electric-vehicle-population-data
- https://catalog.data.gov/dataset/electric-vehicle-population-data/resource/8abbca4e-2f1c-4249-989f-b96201dd4710
- https://catalog.data.gov/dataset/electric-vehicle-population-data/resource/cb5a9adb-b9f1-44f5-aa3f-7f3c768da5d3
Should all have the same array with organization set to State of Washington and publisher set to data.wa.gov
This tutorial should help: https://www.analyticsmania.com/post/ultimate-google-tag-manager-data-layer-tutorial/
From there, I can use GTM dataLayer variables to collect the data and send on pageviews and file downloads to GA, to associate those events with the org.
@robert-bryson any updates here?
@robert-bryson , please update the ticket.
The current approach (JS Google Tag Manger script) is the recommended by Tag Manager but not working for us, obviously. There is a way to generate it via an official server-side tagging option. The docs describe a very different scenario than what is presented above so I am working on achieving the above with the official functionality if possible.
Draft PR at https://github.com/GSA/ckanext-datagovtheme/pull/204.
FYI:
https://github.com/GSA/data.gov/issues/4783#issuecomment-2214240705
Thanks to @jbrown-xentity for a quick fix PR
@robert-bryson added
window.dataLayer = window.dataLayer \|\| [];
--
|
| dataLayer.push({
which resolved the GTM issue. Moving to blocked until pentesting is done so we can push to prod
Pushed to prod. Subject to data populating, will QA Wednesday
as of 7/17, we now have 98% accuracy. Not perfect, but well within a margin of error for bots that I feel comfortable with making the data public