plugin-GoogleAnalyticsImporter icon indicating copy to clipboard operation
plugin-GoogleAnalyticsImporter copied to clipboard

Missing page views due to page size limit defaulting to 1000

Open shanerutter-kempston opened this issue 1 year ago • 11 comments

Finding that matomo is massivly under reporting compared to GA. Found that if I do a export of the page URLs from mataomo there always seems to be a maximum of 1000 unique pages, but in analytics we have 20k unique page urls for that same day.

I have done some checking of the analytics API and done some quick testing and it appears the reporting API defaults to a page size of 1000 results. I made a quick modification in the following file Google\GoogleQueryObjectFactory.php after line 58 I added $request->setPageSize(100000); and did a quick import and can now see its pulling all unique url page views through.

However it only gets a couple days, maybe a months of data before it crashes.

shanerutter-kempston avatar Jan 24 '23 15:01 shanerutter-kempston

@shanerutter-kempston are you using the latest version of the plugin v4.4.6 ? We did fix this issue with #329, what error do you get ?

AltamashShaikh avatar Jan 24 '23 15:01 AltamashShaikh

I see, Ive updated to that version now, but still getting the same error after its processed a couple days of data. I found if I wait an hour or so and then continue the import it works fine for another couple days then fails again with the same issue.

Error message: Error on day 2023-01-18, { "error": { "code": 401, "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.", "errors": [ { "message": "Invalid Credentials", "domain": "global", "reason": "authError", "location": "Authorization", "locationType": "header" } ], "status": "UNAUTHENTICATED" } } These errors are unexpected and will likely continue every time you run the import on this day. To resolve this issue, please [ask on the forums](https://forum.matomo.org/). If you can provide access to your GA account to a member of Matomo's support team it will provide a quicker resolution.

shanerutter-kempston avatar Jan 25 '23 09:01 shanerutter-kempston

@shanerutter-kempston Can you check the token grant rate graph ? Screenshot from 2023-01-26 11-51-28

It will be in your OAuth consent screen

AltamashShaikh avatar Jan 26 '23 06:01 AltamashShaikh

I am assuming the rate limits are the cause of this issue

AltamashShaikh avatar Jan 26 '23 06:01 AltamashShaikh

@shanerutter-kempston Can you confirm your Oauth app is internal/external ? If external can you try publishing it by following this doc and reauthorizing and checking again ?

AltamashShaikh avatar Jan 26 '23 06:01 AltamashShaikh

@AltamashShaikh its an external app, I have just published it. and Left it running for a couple hours. It pulled through more data but eventually ended with the same error. Pictures of the screens requested.

Error message: Error on day 2022-11-15, { "error": { "code": 401, "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.", "errors": [ { "message": "Invalid Credentials", "domain": "global", "reason": "authError", "location": "Authorization", "locationType": "header" } ], "status": "UNAUTHENTICATED" } } These errors are unexpected and will likely continue every time you run the import on this day. To resolve this issue, please [ask on the forums](https://forum.matomo.org/). If you can provide access to your GA account to a member of Matomo's support team it will provide a quicker resolution.

image

image

image

shanerutter-kempston avatar Jan 26 '23 11:01 shanerutter-kempston

@shanerutter-kempston I am still checking why we would get this error after running import for few hours, how much data does it import before throwing error any idea ?

I am trying to reproduce the same..but unable to reproduce it, can you maybe run the import with verbose logging and share the log file here? ./console googleanalyticsimporter:import-reports --idsite={YOUR_IMPORT_ID_SITE} -vvv

AltamashShaikh avatar Jan 27 '23 02:01 AltamashShaikh

@AltamashShaikh Nothing which indicates a problem, other than the google API responding with invalid credentials, I have gotten around the issue by setting up a cronjob to run the CLI import command each hour, its managed to import a years worth of data so far.

shanerutter-kempston avatar Jan 30 '23 09:01 shanerutter-kempston

@shanerutter-kempston have you set ./console googleanalyticsimporter:import-reports --idsite={YOUR_IMPORT_ID_SITE} -vvv like this to run every hour and it works without any error ?

AltamashShaikh avatar Jan 31 '23 03:01 AltamashShaikh

Without the -vvv but yes, it appears google api every now and again rejects the credentials but if you setup the CLI to run every hour, it will continue the import again and google will accept the same credentials again without issue, its a strange issue.... Its not the best way, but its at least getting my data downloaded now.

shanerutter-kempston avatar Jan 31 '23 09:01 shanerutter-kempston

@shanerutter-kempston Strange, if you have already setup an archiving cron, then there is already a task which runs every hour and you don't need to do this separately.

This is the guide to set up auto archiving cron, which will trigger this task

AltamashShaikh avatar Feb 01 '23 02:02 AltamashShaikh