organic-search-analytics
organic-search-analytics copied to clipboard
GoogleAPI startRow/paging support and Data capture loop to import multiple dates
GoogleAPI startRow/paging support: GoogleAPI supports startRow to page through >5000 results. Thats implemented and can be configered in config/config.php Class Config const downloadGoogleMaxPages = 99;
Data capture via CLI/Cronjob was a little bit difficult because the dates were not clear. Now its possible to loop the import through multiple dates from CLI. Use Config: const maxDaysBatchImport = 99; to allow/limit that.
Thank you @pfauenauge for this commit. But more so, thank you for bringing to light the fact that Google has modified their AIP with the startRow feature. I was unaware of this.
I want to review your changes and how they will impact this tool overall before merging this pull request. This is a significant change (for the better) and thus should be reviewed properly.
For reference - here is the Google API documentation regarding this feature. https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/search_analytics#getting-more-than-5000-rows
Hi,
thanks for providing the tool anyway!
I had changed the google api Webmaster class directly instead of extending a new class which might be not the best way. So, on the next update the changes would be overwritten, right?
No hurry, I just wanted share my adjustment beacause they might be helful for others, too.
Stefan
Am 19.05.2017 um 15:43 schrieb PromInc:
Thank you @pfauenauge https://github.com/pfauenauge for this commit. But more so, thank you for bringing to light the fact that Google has modified their AIP with the startRow feature. I was unaware of this.
I want to review your changes and how they will impact this tool overall before merging this pull request. This is a significant change (for the better) and thus should be reviewed properly.
For reference - here is the Google API documentation regarding this feature. https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/search_analytics#getting-more-than-5000-rows
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302706349, or mute the thread https://github.com/notifications/unsubscribe-auth/AbdShVbPbfFmSAUn71EPuJhBn0HiuQZGks5r7ZyFgaJpZM4NfyNq.
Yes, you are right that updating the Google API class would overwrite your change.
The version of the Google API (version 1.1.5 vs the current version 2.1.3) in this project is actually quite out of date and thus may be worth upgrading anyway. The newest version includes the startRow.
I'm exploring what updating the Google API will look like right now, which would then pave a solid foundation to layer your solution on top of.
@pfauenauge, can you help me to understand the reason for the downloadGoogleMaxPages config option?
I'm half guessing that this is to prevent a page from running too long and PHP would then time-out. This could be adjusted by modifying the ini file or via PHP for the max-execution time.
But what my concern here is that this option would still potentially clip a dataset. If Google returns 20,000 results and this variable is set to 3, that date would be cut short.
I've tested your code and it seems to work well, however only modifies the CRON portion really. I'd like to integrate this into the GUI as well if your ok with me expanding off of this.
Also, RE: the modification to the Google API class, I've looked into the updated version of the API. It works and has the startRow piece integrated into itself already. The downside is that it's a major change from a structural standpoint and will require all users change their integration through the Google Developer Console. So while it's wise to make that change, I'm not sure it's worth jumping the extra hurdle at this point or not... Skipping that update at this time in favor of your patch to the API would grant users access to all their data now and can update to a newer version that has the updated API in a future release.
Brian Prom (320) 250-3830 On May 20, 2017, at 10:29 AM, pfauenauge <[email protected]mailto:[email protected]> wrote:
Hi,
thanks for providing the tool anyway!
I had changed the google api Webmaster class directly instead of extending a new class which might be not the best way. So, on the next update the changes would be overwritten, right?
No hurry, I just wanted share my adjustment beacause they might be helful for others, too.
Stefan
Am 19.05.2017 um 15:43 schrieb PromInc:
Thank you @pfauenauge https://github.com/pfauenauge for this commit. But more so, thank you for bringing to light the fact that Google has modified their AIP with the startRow feature. I was unaware of this.
I want to review your changes and how they will impact this tool overall before merging this pull request. This is a significant change (for the better) and thus should be reviewed properly.
For reference - here is the Google API documentation regarding this feature. https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/search_analytics#getting-more-than-5000-rows
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302706349, or mute the thread https://github.com/notifications/unsubscribe-auth/AbdShVbPbfFmSAUn71EPuJhBn0HiuQZGks5r7ZyFgaJpZM4NfyNq.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302880023, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG-v0dMPGE20rFK9is_lD06nH_U54Jobks5r7wbsgaJpZM4NfyNq.
Hello Brian,
the downloadGoogleMaxPages option is about paging through the GSC results since they are limited to 5000 for each page. That means a config of 10 loops through max. 10 pages which makes up to 50.000 results. So, it's not about execution time, but simply cutting of the resulting volume. Better would be to set a limit of min. impressions required to import (e.g. take only results which have at least 2 impressions per day), but that I found even less transparent. But that's up for your opinion.
Great to hear, that you would like to integrate this!
I agree with you that switching the API version is too big of a step. What I think is still possible is to extend the Class /Google_Service_Webmasters_SearchAnalyticsQueryRequest/ which is instanciated in /downloadGoogleSearchAnalytics/() DataCapture.php with a new class. That would even be compatible with the new version of the API, preventing conflicts just in case someone already upgraded the API.
So, go ahead either way (waiting or extending) - both is fine for me.
Stefan
Stefan Pfau
[email protected] Mobil 0176 / 42 60 30 81
Am 22.05.2017 um 04:11 schrieb PromInc:
@pfauenauge, can you help me to understand the reason for the downloadGoogleMaxPages config option?
I'm half guessing that this is to prevent a page from running too long and PHP would then time-out. This could be adjusted by modifying the ini file or via PHP for the max-execution time.
But what my concern here is that this option would still potentially clip a dataset. If Google returns 20,000 results and this variable is set to 3, that date would be cut short.
I've tested your code and it seems to work well, however only modifies the CRON portion really. I'd like to integrate this into the GUI as well if your ok with me expanding off of this.
Also, RE: the modification to the Google API class, I've looked into the updated version of the API. It works and has the startRow piece integrated into itself already. The downside is that it's a major change from a structural standpoint and will require all users change their integration through the Google Developer Console. So while it's wise to make that change, I'm not sure it's worth jumping the extra hurdle at this point or not... Skipping that update at this time in favor of your patch to the API would grant users access to all their data now and can update to a newer version that has the updated API in a future release.
Brian Prom (320) 250-3830 On May 20, 2017, at 10:29 AM, pfauenauge <[email protected]mailto:[email protected]> wrote:
Hi,
thanks for providing the tool anyway!
I had changed the google api Webmaster class directly instead of extending a new class which might be not the best way. So, on the next update the changes would be overwritten, right?
No hurry, I just wanted share my adjustment beacause they might be helful for others, too.
Stefan
Am 19.05.2017 um 15:43 schrieb PromInc:
Thank you @pfauenauge https://github.com/pfauenauge for this commit. But more so, thank you for bringing to light the fact that Google has modified their AIP with the startRow feature. I was unaware of this.
I want to review your changes and how they will impact this tool overall before merging this pull request. This is a significant change (for the better) and thus should be reviewed properly.
For reference - here is the Google API documentation regarding this feature.
https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/search_analytics#getting-more-than-5000-rows
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub
https://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302706349,
or mute the thread
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302880023, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG-v0dMPGE20rFK9is_lD06nH_U54Jobks5r7wbsgaJpZM4NfyNq.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PromInc/organic-search-analytics/pull/54#issuecomment-302982590, or mute the thread https://github.com/notifications/unsubscribe-auth/AbdShWKlBEY0D7GaehnPlJ185Dv01iXJks5r8O7rgaJpZM4NfyNq.
Thank you for the explanation.
The limit of 50,000 is just a personal limit you set then, is that correct? I can't find any documentation that says this is a hard limit by the API (only the 5,000 per request).
Google Webmaster Tools API Quotas & Limits
Per-site limit (calls querying the same site) | Per-user limit (calls made by the same user) | Per-project limit (calls made using the same Developer Console key) |
---|---|---|
5 queries per second | 200 queries per minute | 100,000,000 queries per day |
5 queries per second | 200 queries per minute |