device-detector
device-detector copied to clipboard
Commercial model returned when parsing User Agent string
Dear Matomo Team,
I have experimented with device_detector 0.10 version some strange behaviour with Python port library.
When I parse the following user-agent string: Mozilla/5.0 (Linux; Android 6.0.1; SM-G532G Build/MMB29T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36
- If I try to retrieve the model field information, what I retrieve is the following commercial model response: GALAXY J2 Prime
- What I expected to see is the raw model found in user-agent string: SM-G532G
The following is the code I run to perform the test:


Thank you very much in advance!
Kind regards.
It's currently not possible to return the raw model if a commercial model is defined
Dear @sgiehl and @mattab , From Matomo Team do you contemplate a change request in your product roadmap in order to generate another field named key_model with the value in raw UserAgent (SM-G532G) ? In our use case we are interested to identify models with this raw value. If so, can I collaborate with a pull request to original Matomo project to add this new feature?
that's wouldn't be too easy to implement. Many device are using there "commercial model" in the useragent. Some have multiple model versions, that we group together to the same "commercial model". e.g. https://github.com/matomo-org/device-detector/blob/e012536928a9632efafebc01eeac0da4258b4468/regexes/device/mobiles.yml#L9759-L9760 splitting those up, would end up in a lot more detection rules.
What exactly do you need those raw model names for?
Dear @sgiehl ,
We need the raw models because we are trying to correlate device data received from TAC database used by telecom. carriers (like http://tacdb.osmocom.org/ or https://imeidb.gsma.com/imei/index#) with the information received from useragent string. But we cannot correlate and compare the data because the nomenclature of models in these databases is different.
With the example you have provided, certainly I see some mix of raw models and also commercial models in user-agent string. So, one problem is that we are dealing with unstructured data. Another problem I see is useragent is not standard, then some providers like Apple can cause some difficulties to identify the exact iPhone model version.
But regarding the performance issue you mention, if we are interested to return raw model, you assert split these regex rules one-to-one would lead to a lot more detection rules. But what is the overhead in terms of performace? I understand when you write a single pattern, obviously, the string is parsed only once, but I am not sure about magnitude order of overhead time costs. Because, actually if you write a single pattern, you have to compare the input string anyway with every raw model:
regex: '(?:SAMSUNG-)?(?:GT-I9500|GT-I9502|GT-I9505|SCH-I545|SCH-I959|SCH-R970|GALAXY-S4|SGH-M919N?)'
Can be affected performance of application substantially ?
Thank you in advance Stefan!
Kind regards.
regex: '(?:SAMSUNG-)?(?:GT-I9500|GT-I9502|GT-I9505|SCH-I545|SCH-I959|SCH-R970|GALAXY-S4|SGH-M919N?)'
If you would rewrite this rule to (?:SAMSUNG-)?(GT-I9500|GT-I9502|GT-I9505|SCH-I545|SCH-I959|SCH-R970|GALAXY-S4|SGH-M919N?) you could use $1 to catch the raw model without splitting the rule.
Exactly @mimmi20 , it is not necessary split the rule, because you receive raw model as an input $1. We can generate a new field named 'raw_model' and assign $1, and mantain of course current 'model' field generated with the regular expression.
The question now is, as a Matomo Team do you think this is a feature useful for other stakeholders using the library and we can add this 'raw_model' in Matomo official repository? Or it is better I develop this new feature in a forked branch right now ?
Thank you!
I'm not against returning an additional field that contains the actual match. But I think it will be a lot work to go through all detections and adjust them, so $1 always contains the raw model.
Also there might be cases where the regex doesn't match the full raw model, as there might be some additional characters in the useragent, but not in the regex. Like GT-I9500 also matches on GT-I9500A
OK @sgiehl and @mimmi20 , if I understood correctly, actual match ($1) is not returned actually by the library (it is only an internal variable), but it is possible to modify the code in a forked branch in order to return $1 variable without the need to modify any rule nor create any new field ? Is it correct ?
Thank you!
I have another suggestion, so as not to edit thousands of regular expressions and tests, I suggest:
you can create an additional method for browsers, but you will have to make your own regular expression for all browsers
example
----------- version ---- lang ------- hardware name
Android (?:[\d.]+;)\s?(?:[^;]+;)?\s?([^\.\)]+)(?: Build.+|\)) AppleWebKit
test ua Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; PBCT10 Build/OPM1.171019.011) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36
@csanclop No, you would need to go through all regexes and check them, there are a lot, that do not include any specific matches, or where only some part is matched to extend the model, ...
See https://github.com/matomo-org/device-detector/blob/e012536928a9632efafebc01eeac0da4258b4468/regexes/device/mobiles.yml#L26-L29
https://github.com/matomo-org/device-detector/blob/e012536928a9632efafebc01eeac0da4258b4468/regexes/device/mobiles.yml#L424-L426
https://github.com/matomo-org/device-detector/blob/e012536928a9632efafebc01eeac0da4258b4468/regexes/device/mobiles.yml#L4867-L4868
Those, and a lot more would need to be adjusted, that $1 returns the raw model and other matches are used to build the model.
Dear @sgiehl ,
Is it possible to consider @sanchezzzhak suggestion, in order to avoid redo thousands of regular expressions and tests ?
Kind regards.
That would mean doing additional regexes that matches specific browser user agents. That could be done in an additional parser
I created a mini prototype, but not all options and all possible options are made https://github.com/sanchezzzhak/device-detector/blob/6267/Parser/AliasDevice.php
fixture file https://github.com/sanchezzzhak/device-detector/blob/6267/regexes/alias_devices.yml
test class https://github.com/sanchezzzhak/device-detector/blob/6267/Tests/Parser/AliasDeviceTest.php
test fixture file https://github.com/sanchezzzhak/device-detector/blob/6267/regexes/alias_devices.yml
I will be happy if you have any ideas on how to call the class by a more correct name
Thank you very much @sanchezzzhak !
This prototype is excellent ! I think the name when you call the class (alias_devices) is correct. Now I have an idea about how are these Matomo parsers, because it is first time I am dealing with this library.
This is a working solution, right ?
Kind regards.
you can try using it and report problems.
use DeviceDetector\Parser\AliasDevice;
$userAgent = $_SERVER['HTTP_USER_AGENT'] ?? '';
$parser = new AliasDevice;
$parser->setUserAgent($userAgent);
$result = $parser->parse();
var_dump($result);
// result empty array or ['name' => 'model raw name']
This works fine @sanchezzzhak Thank you so much !
Kind regards.
Thank you @sgiehl and @mimmi20 to help me understand the problem and understand these regex and parsers behind the scenes.
Kind regards.