matomo icon indicating copy to clipboard operation
matomo copied to clipboard

[Bug] Matomo API segmentation on action Actions.getPageUrls gives inconsistent results and device type switches from upper case to lower case

Open fabianboerner opened this issue 1 year ago • 9 comments

What happened?

We are calling

/?module=API&method=Actions.getPageUrls&idSite=1&format=JSON&date=2024-09-17,2024-09-18&period=day&segment=deviceType==Phablet

or

/?module=API&method=Actions.getPageUrls&idSite=1&format=JSON&date=2024-09-17,2024-09-18&period=day&segment=deviceType==phablet

One day we are getting results for phablet but none for Phablet and the next day we are getting results for Phablet but not for phablet

My post on the community forum:

https://forum.matomo.org/t/when-calling-the-api-with-smartphone-or-smartphone-in-segement-getting-different-results/59396/4

This happens for all devices and we cannot figure out what is happening here.

** EDIT

With the operator =^ starts With and "phablet", "smartphone" etc it seemed to be consistent first but then on some days i get the same results for =^phablet and for =^smartphone

It looks like the newest Data is not available with "phablet" but "Phablet"

The difference of nb_unique_visits is in those files: P_hablet.json phablet.json

With segement=deviceType=@smartphone i get more nb_uniq_visitors than without segmentation

What should happen?

  1. We should get the same data for "Phablet" or "phablet".

or

  1. One of those two should be considered an invalid parameter and throw an error

How can this be reproduced?

Import data to matomo and call the method Actions.getPageUrls with segmentation.

Matomo version

5.1.1

PHP version

PHP 8.2.21 (cli) (built: Jul 19 2024 10:33:10) (NTS)

Server operating system

Debian GNU/Linux 12 (bookworm)

What browsers are you seeing the problem on?

Not applicable (e.g. an API call etc.)

Computer operating system

No response

Relevant log output

No response

Validations

fabianboerner avatar Sep 18 '24 14:09 fabianboerner

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/when-calling-the-api-with-smartphone-or-smartphone-in-segement-getting-different-results/59396/6

Hi @fabianboerner, thanks for reporting this issue. Unfortunately, we're unable to reproduce the problem. Are you able to provide any other info that may help us reproduce the issue?

randy-innocraft avatar Sep 20 '24 17:09 randy-innocraft

@randy-innocraft i dont know how, i only saw that some visits from desktop are causing some issues if i exclude them in the segmentation it looks almost alright.

The only thing i could provide is a database export. If there is a way to provide this in a secure environment maybe that would help?

fabianboerner avatar Sep 20 '24 17:09 fabianboerner

Hey @fabianboerner, From a technical point of view the problem you reported can't have anything to do with the lower or upper case writing of the device type in the segment. Matomo internally stores the device type as an integer. The device type you provide in the segment is mapped (case insensitive) to an according integer value. So there can't be any difference between Phablet and phablet. It might though be possible that archiving for those segments might have run at a different time, causing different data to be included for a current period.

This is most likely an issue how archiving is set up. Are you also seeing differences in data if you look e.g. on data for yesterday or a day before?

sgiehl avatar Sep 24 '24 09:09 sgiehl

@sgiehl the JSON i had provided are directly from the API. Sometimes there is also new Data doesnt show up for "phablet" but for "Phablet". I invalidate and archive again it shows up but some days its still different.

What can i say im not making this up. There was one new issue we got Desktop entries without nb_unique_visits and if we added them to the segement the numbers of smartphone with phablet was more then with no segmentation.

What you mean with archiving is set up? We use the standard installation and changed the admin credentials. There is no configuration change. The cron is running with the standard archiving command.

fabianboerner avatar Sep 24 '24 11:09 fabianboerner

So you have set up an archiving cron and disabled browser archiving? If so, how often is the cron triggered? Data in the reports only becomes available after archiving ran. So if a report for today is e.g. processed at 9 am and then again at 3pm, the data inbetween will only become available after the archiving at 3 pm finished. And if you have two segments that are processed for today, they might contain different data, depending on when they are processed.

sgiehl avatar Sep 24 '24 12:09 sgiehl

The cronjob for archiving runs every 5 minutes. Does browser archiving can cause such issues if ran while the console command is running?

fabianboerner avatar Sep 24 '24 12:09 fabianboerner

Unless the job is able to finish within those 5 minutes this might be the root cause of your problem. There are a couple of issues around concurrent archiving that will be fixed with the next release (Matomo 5.2.0). Till then running archiving too often in parallel can make things worse.

sgiehl avatar Sep 24 '24 12:09 sgiehl

i will extend the window of the cron and then will look at the results. But normally archiving is finished in seconds. I will just try.

fabianboerner avatar Sep 24 '24 12:09 fabianboerner

Closing this as there hadn't been any further activity. We have implemented a couple improvements around (concurrent) archiving. So hopefully that helped sorting out your issue. If not feel free to reach out for help on our forum, or comment back here, so we can further investigate and reopen the issue if needed. Thanks.

sgiehl avatar Apr 15 '25 14:04 sgiehl

Resolution: It was a mistake and it was not obvious. It was the automatic grouping into "others" since we have not really an order it always resulted in something different. It was just by reading into how to ship the data to an external BI tool that i discovered this setting. So i extended the grouping to 10000 entries and then i got all the data.

fabianboerner avatar Apr 15 '25 17:04 fabianboerner