matomo-log-analytics
matomo-log-analytics copied to clipboard
Proposal: option to provide a static url->title mapping for the "Page titles" report
I want to use Matomo to track visits on my site, and I decided to use only log analytics. I'm aware of the limitations, that some things will be missing compared to JS tracking - like resolution info, outgoing links, plugin support etc., and I'm fine with that.
However, I've realized that on a relatively rarely changing site like mine - a blog with a few articles posted per year at most - I can solve one of these things, namely the "Page titles" which are normally read in JS from the <title> tag, by providing a static list of url -> title mappings in a file passed in a parameter to the import script.
The file looks like this (the first = is treated as a separator and both sides are trimmed from whitespace, but of course we can change the format to e.g. CSV or something else):
https://mackuba.eu/ = mackuba.eu
https://mackuba.eu/2018/09/07/new-stuff-from-wwdc-2018/ = New stuff from WWDC 2018 – mackuba.eu
https://mackuba.eu/2018/07/10/dark-side-mac-2/ = Dark Side of the Mac: Updating Your App – mackuba.eu
https://mackuba.eu/2018/06/11/notifications-in-ios-12/ = What's new in notifications in iOS 12 – mackuba.eu
...
Now, when I call import_logs.py with --page-titles-from=page_titles.txt, whenever the parser sees a hit with URL e.g. https://mackuba.eu/2018/07/10/dark-side-mac-2/, it will set the action_name to "Dark Side of the Mac: Updating Your App – mackuba.eu", and so on. My "Page titles" report looks just like with the JS tracker version, and I only need to remember to update the file whenever I post a new article (or better, automate it).
I believe quite a lot of sites using log analytics could use something like this. Depending on the size and type of the site the file can be maintained manually, or built from a database of articles/pages using a script or an action on the server. In my case, I wrote a small script that loads my sitemap.xml file and then goes through all link items listed there, fetches each HTML and extracts the <title> from it.