koko-analytics icon indicating copy to clipboard operation
koko-analytics copied to clipboard

Aggregate certain referrer hosts

Open dannyvankooten opened this issue 5 years ago • 7 comments

  • google.de | google.de/url | google.de/search
  • bing.com | bing.com/search
  • facebook.com | l.facebook.com | m.facebook.com
  • pinterest.com | pinterest.com/pin/xxx (the xxx standing for an individual pin url)

This is not an exhaustive list, so suggestions are welcome.

dannyvankooten avatar Jan 22 '20 09:01 dannyvankooten

So after some more thinking, there are two possible option for each referrer URL coming in.

1. Multiple URL's with the same meaning

In case of google.com/search, bing.com/search and any URL that has multiple versions pointing to what is essentially the same thing, we want to store them in the simplest form possible.

So google.com/search becomes google.com.

In my opinion, google.de/search becomes google.de instead of google.com as the TLD does hold some information that may be valuable.

2. All other URL's with a path component

In most other cases, we want to keep the full URL but also offer an aggregated total in the dashboard so that we can see the total amount of traffic coming from that domain while still being able to zoom in and see which pages actually generated that traffic.

Option 1 is taken care of cd4a743c2b65f6eb2da227c3052273633684f8b7 and allows us to easily extend that list with other mainstream options, a filter hook may be useful for users.

I haven't yet gone over the details for solving option 2 but am reasonably confident it can be done without massively inflating storage requirements.

dannyvankooten avatar Jan 23 '20 10:01 dannyvankooten

Two questions regarding this issue:

First, Twitter's t.co link shorter really adds up quickly as individuals entries. I don't know how challenging this would be, but a great way to avoid this would be aggregate them and offer a "plus button" aside the main t.co entry that, when clicked, expand a list with all individual links.

The other one is related to Feedly, the RSS aggregator. I noticed clicks from collections at Feedly are shown as a complete, private (non-accessible by anyone) on stats. Real example: https://feedly.com/i/collection/content/user/323ef75e-3ae1-4b9a-9f90-05eefc034813/category/global.all Since this kind of direct link is useless, maybe aggregate all like it in a single, "Feedly Collection" label?

An RSS entry in Feedly that isn't in a collection is perfectly clickable, hence it would benefit from a solution similar to the one suggested above for Twitter's t.co links. Example: https://feedly.com/i/entry/00nexjUMDjWDfmBvVM9H1PdsUJLyPJmkdIH23dKer+c=_16ff7d4bfd1:5f6e6d4:bb2cd839

rghedin avatar Feb 01 '20 21:02 rghedin

Hey Danny, I have combed through the referrer stats from the time since the last plugin update, and want to share the remaining duplicates that I have found. I hope this helps!

Facebook:

  • https://www.facebook.com
  • https://facebook.com
  • https://l.facebook.com
  • https://l.facebook.com/l.php
  • l.facebook.com/l.php
  • https://de-de.facebook.com
  • https://lfacebook.com (don't know if this is a mistake, but we had 17 hits from this)
  • Android app: facebook.com (I think clicks from the Facebook app should not be counted separately)

Instagram:

  • instagram.com
  • https://l.instagram.com

Google:

  • https://www.google.com
  • www.google.com

In addition to this, I see the following referrers that seem to indicate the google app as the source. In my opinion, it would make sense to count those as "normal" google.com searches:

  • Android app: com.google.android.googlequicksearchbox
  • Android app: com.google.android.googlequicksearchbox/https/www.google.com

Ecosia:

  • https://www.ecosia.org
  • https://www.ecosia.org/search

Bing:

  • https://www.bing.com
  • https://bing.com

danielrunvegan avatar Feb 13 '20 09:02 danielrunvegan

Awesome @danielrunvegan, that is super helpful indeed! Thank you so much.

dannyvankooten avatar Feb 14 '20 08:02 dannyvankooten

@dannyvankooten the last update cleaned up almost all of the duplicates for me! Here are the remaining candidates I see for the last 7 days (with my suggestion as to where they should be aggregatet to):

  • facebook.com --> https://facebook.com
  • www.google.com --> https://www.google.com
  • www.instagram.com --> https://www.instagram.com
  • www.google.de --> https://www.google.de
  • https://play.google.com/store/apps/details?id=facebook.com --> https://www.facebook.com

And the following could all be aggregated to --> https://www.google.com

  • https://play.google.com/store/apps/details?id=com.www.google.android.googlequicksearchbox/https/www.google.com
  • https://play.google.com/store/apps/details?id=com.www.google.android.googlequicksearchbox
  • https://play.google.com/store/apps/details?id=com.www.google.android.gm

danielrunvegan avatar Feb 24 '20 16:02 danielrunvegan

@dannyvankooten I've just noticed that the referrer aggregation for Pinterest seems to be broken. I get separate results for the following variations (all are shown as pinterest.com):

pinterest.com www.pinterest.com https://pinterest.com https://www.pinterest.com

danielrunvegan avatar Mar 04 '21 17:03 danielrunvegan

Email newsletter services that use unique links, eg image

arnelap avatar Jul 17 '21 05:07 arnelap