Harmonize Readers' User-Agent
Looking for per-reader downloads stats, we realized (back in 2017! https://github.com/kiwix/container-images/issues/82) that our stats analytical tool (matomo) doesn't identify most of our readers. While this is a matomo/operations issue/change, it highlighted the fact that our downloader's User-Agents are mising or poorly chosen.
Here's the situation. The identification column was built with matomo's lib for this.
| Reader | User-Agent | Sample | Identified as |
|---|---|---|---|
| kiwix-desktop | aria2/{aria-version} |
aria2/1.36.0 |
Client Type=library, Name=Aria2, Version=1.36.0 |
| kiwix-android | kiwix-android-version:{VersionCode} |
kiwix-android-version:231101, kiwix-android-version:-1 |
OS Name=Android, ShortName=AND, Platform=, Family=Android, Version= |
| kiwix macOS | Kiwix/{ProjectVersion} CFNetwork/{CFNetworkVersion Darwin/{DarwinVersion} |
Kiwix/173 CFNetwork/1568.100.1.1.1 Darwin/24.0.0 |
OS Name=iOS, ShortName=IOS, Platform=, Family=iOS, Version=18.0 |
| kiwix iOS | Kiwix/{ProjectVersion} CFNetwork/{CFNetworkVersion Darwin/{DarwinVersion} |
Kiwix/173 CFNetwork/1568.100.1.2.1 Darwin/24.0.0 |
OS Name=iOS, ShortName=IOS, Platform=, Family=iOS, Version=18.0 |
| Kiwix JS Electron | xxx KiwixJSElectron/{nwVersion}-E xxx (used in UA built by Electron) |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) KiwixJSElectron/3.4.1-E Chrome/122.0.6261.156 Electron/29.3.1 Safari/537.36 |
Client & OS Type=mobile app, Name=KiwixJSElectron, Version=3.4.1, Name=Windows, ShortName=WIN, Platform=x64, Family=Windows, Version=10 |
- Kiwix JS extensions and PWA uses the browser so there's no specific U-A
- It looks like Kiwix JS Election is the only one that's correct and I believe it's not specifically set but built by Electron.
- Apple version for macOS and iOS are almost identical. matomo is even detecting iOS for the macOS version.
- User-Agent convention is pretty clear
- kiwix-desktop should set
--user-agentwhen configuring aria2 - Apple should manually set the
User-Agentheader of theURLRequest - Build versions as used in Android and Apple are of little help for our use cases. We want human version numbers. U-A allows us to specify both in the same string though.
I suggest we use the following: Kiwix-{flavor}/{humanVersion} ({platform}/{platformVersion}). I believe the build number is not useful but this is debatable and can be added as a comment (after the parenthesis).
Which would translate as follows:
Kiwix-desktop/2.3.1-4 (Windows/11)
Kiwix-android/3.11.1 (droid/12)
Kiwix-ios/3.6.0 (iOS/18.0)
Kiwix-macos/3.6.0 (macOS/15.0)
The most important question being: do we want to consider all readers as Kiwix product or should each be its own product? If not, then matomo for instance would group them all under the Kiwix product and we'd only be able to distingish readers-originated downloads from others but not compare readers together
Kiwix/2.3.1-4 (Windows/11)
Kiwix/3.11.1 (droid/12)
Kiwix/3.6.0 (iOS/18.0)
Kiwix/3.6.0 (macOS/15.0)
Once this is settled, we can both patch our matomo image and make a PR for matomo's repo
628a629,653
>
> - regex: 'Kiwix-desktop/(\d+[\.\d]+)'
> name: 'Kiwix Desktop'
> version: '$1'
> url: 'https://github.com/kiwix/Kiwix-desktop'
>
> - regex: 'Kiwix-android/(\d+[\.\d]+)'
> name: 'Kiwix Android'
> version: '$1'
> url: 'https://github.com/kiwix/kiwix-android'
>
> - regex: 'Kiwix-ios/(\d+[\.\d]+)'
> name: 'Kiwix iOS'
> version: '$1'
> url: 'https://github.com/kiwix/kiwix-apple'
>
> - regex: 'Kiwix-ipados/(\d+[\.\d]+)'
> name: 'Kiwix iPadOS'
> version: '$1'
> url: 'https://github.com/kiwix/kiwix-apple'
>
> - regex: 'Kiwix-macos/(\d+[\.\d]+)'
> name: 'Kiwix macOS'
> version: '$1'
> url: 'https://github.com/kiwix/kiwix-apple'
There's also the question of CustomApps. In the logs, I've seen AndroidDownloadManager (when deferred to the system?), QtWebEngine (kiwix-desktop ??), WikivoyagebyKiwix (another branding hell).
That's interesting. Can I assume you are referring to running analytics on the download library's server?
It looks like Kiwix JS Election is the only one that's correct and I believe it's not specifically set but built by Electron.
This is derived from package.json and Electron populates its headers based on the data set there. As package.json is only valid with a version number, I don't think I'd be able to disable that easily (well, you can do most things in Electron, so there would be a way, but it would need specific coding, which it would be a shame to have to do if the version number is the only thing you don't want).
Kiwix JS extensions and PWA uses the browser so there's no specific U-A
This is more complex. I cannot directly modify the User-Agent header through JavaScript for security reasons. This is a protected header that only browsers control. For cross-origin requests, I can, however, set certain CORS-safe headers like:
Accept
Accept-Language
Content-Language
X-Requested-With
X-App-Info
For this to work, say if I were to use X-App-Info, the server must be configured to accept a custom header through CORS by including it in the Access-Control-Allow-Headers response header.
Please note that the PWA still accesses download.kiwix.org (which was CORS-enabled at my request several years ago), while the Browser Extension's in-app library uses an iframe to display library.kiwix.org to the user in a basic way, given that we haven't implemented (lack of time!) API access via application/json. Since library.kiwix.org does not (or didn't at the time of development) enable cross-origin requests in its Response header, we cannot modify anything about those requests currently. I'm not sure if you'd be willing to enable CORS for library.kiwix.org given that the proper way to access the library is via the API... (and I do intend to explore that sometime).
One further issue is the necessary inconsistency in handling of the ZIM download (if this is being monitored):
- Browser Extension: download is handed over to the Browser. It is not currently enabled in app.
- Electron: all downloads are managed in-app using Electron APIs.
- PWA: if the browser has the File System Access API and/or Origin Private File System, the app manages direct downloads from Kiwix into the OPFS or access-enabled directory, but it hands off downloads from mirrors to the browser (due to CORS). If those APIs are not available, download is handed over to the browser.
Thanks for those details @Jaifroid ; the data was indeed extracted from logs of download.kiwix.org.