[Bug] Matomo API segmentation on action Actions.getPageUrls gives inconsistent results and device type switches from upper case to lower case
What happened?
We are calling
/?module=API&method=Actions.getPageUrls&idSite=1&format=JSON&date=2024-09-17,2024-09-18&period=day&segment=deviceType==Phablet
or
/?module=API&method=Actions.getPageUrls&idSite=1&format=JSON&date=2024-09-17,2024-09-18&period=day&segment=deviceType==phablet
One day we are getting results for phablet but none for Phablet and the next day we are getting results for Phablet but not for phablet
My post on the community forum:
https://forum.matomo.org/t/when-calling-the-api-with-smartphone-or-smartphone-in-segement-getting-different-results/59396/4
This happens for all devices and we cannot figure out what is happening here.
** EDIT
With the operator =^ starts With and "phablet", "smartphone" etc it seemed to be consistent first but then on some days i get the same results for =^phablet and for =^smartphone
It looks like the newest Data is not available with "phablet" but "Phablet"
The difference of nb_unique_visits is in those files: P_hablet.json phablet.json
With segement=deviceType=@smartphone i get more nb_uniq_visitors than without segmentation
What should happen?
- We should get the same data for "Phablet" or "phablet".
or
- One of those two should be considered an invalid parameter and throw an error
How can this be reproduced?
Import data to matomo and call the method Actions.getPageUrls with segmentation.
Matomo version
5.1.1
PHP version
PHP 8.2.21 (cli) (built: Jul 19 2024 10:33:10) (NTS)
Server operating system
Debian GNU/Linux 12 (bookworm)
What browsers are you seeing the problem on?
Not applicable (e.g. an API call etc.)
Computer operating system
No response
Relevant log output
No response
Validations
- [X] Read our Contributing Guidelines.
- [X] Follow our Security Policy.
- [X] Check that there isn't already an issue that reports the same bug to avoid creating duplicates.
- [X] The provided steps to reproduce is a minimal reproducible of the Bug.
This issue has been mentioned on Matomo forums. There might be relevant details there:
https://forum.matomo.org/t/when-calling-the-api-with-smartphone-or-smartphone-in-segement-getting-different-results/59396/6
Hi @fabianboerner, thanks for reporting this issue. Unfortunately, we're unable to reproduce the problem. Are you able to provide any other info that may help us reproduce the issue?
@randy-innocraft i dont know how, i only saw that some visits from desktop are causing some issues if i exclude them in the segmentation it looks almost alright.
The only thing i could provide is a database export. If there is a way to provide this in a secure environment maybe that would help?
Hey @fabianboerner,
From a technical point of view the problem you reported can't have anything to do with the lower or upper case writing of the device type in the segment.
Matomo internally stores the device type as an integer. The device type you provide in the segment is mapped (case insensitive) to an according integer value. So there can't be any difference between Phablet and phablet. It might though be possible that archiving for those segments might have run at a different time, causing different data to be included for a current period.
This is most likely an issue how archiving is set up. Are you also seeing differences in data if you look e.g. on data for yesterday or a day before?
@sgiehl the JSON i had provided are directly from the API. Sometimes there is also new Data doesnt show up for "phablet" but for "Phablet". I invalidate and archive again it shows up but some days its still different.
What can i say im not making this up. There was one new issue we got Desktop entries without nb_unique_visits and if we added them to the segement the numbers of smartphone with phablet was more then with no segmentation.
What you mean with archiving is set up? We use the standard installation and changed the admin credentials. There is no configuration change. The cron is running with the standard archiving command.
So you have set up an archiving cron and disabled browser archiving? If so, how often is the cron triggered? Data in the reports only becomes available after archiving ran. So if a report for today is e.g. processed at 9 am and then again at 3pm, the data inbetween will only become available after the archiving at 3 pm finished. And if you have two segments that are processed for today, they might contain different data, depending on when they are processed.
The cronjob for archiving runs every 5 minutes. Does browser archiving can cause such issues if ran while the console command is running?
Unless the job is able to finish within those 5 minutes this might be the root cause of your problem. There are a couple of issues around concurrent archiving that will be fixed with the next release (Matomo 5.2.0). Till then running archiving too often in parallel can make things worse.
i will extend the window of the cron and then will look at the results. But normally archiving is finished in seconds. I will just try.
Closing this as there hadn't been any further activity. We have implemented a couple improvements around (concurrent) archiving. So hopefully that helped sorting out your issue. If not feel free to reach out for help on our forum, or comment back here, so we can further investigate and reopen the issue if needed. Thanks.
Resolution: It was a mistake and it was not obvious. It was the automatic grouping into "others" since we have not really an order it always resulted in something different. It was just by reading into how to ship the data to an external BI tool that i discovered this setting. So i extended the grouping to 10000 entries and then i got all the data.