kuma
kuma copied to clipboard
U - Back-end telemetry in GA creates bogus traffic
Summary The event "auth-started" has gone up 3600% in the last two weeks, all other events have been below 15%+. It's already problematic that back-end generated events don't send a client ID to correlate with front-end usage, but as long as the numbers are low it doesn't skew the numbers much. In this case though we are counting an additional 250k uses per week that don't actually exist and it skews the numbers.
Steps To Reproduce (STR)
- check numbers on GA. Direct Link
Actual behavior A large number of bogus events was logged (maybe from bots?)
Expected behavior Every event should send a client ID, where possible, additionally log-in links should have a no-follow attribute.
Every event without a GA client Id, counts as a new user.
Solutions? @tobinmori
- Inject client_id into links?
- new task: drop event (for now), p1, 0.5
- next sprint: next task: look at mitigation
Another "mitigation" is to check the HTTP referer. If you use curl http://localhost.org:8000/users/github/login/
it means the request.META.get('HTTP_REFERER')
becomes None
.
If you actually click the GitHub/Google button in a browser the HTTP_REFERER
becomes a URL whose hostname is the same as ours.
Also, I tested that if I click to open the modal but then either ⌘-click or if I right-click and select "Open link in a new tab" the HTTP_REFERER gets set.
I'll make a PR to demonstrate.
Can we please just comment out the event collection for this already? It's massively messing up our numbers.
Nevermind, it's in production already, so I should hopefully see the effects tomorrow.
@tobinmori , @atopal - Wanted to confirm - what epic is this user story part of?
It is already an epic.
@chinikes This is a P1.