wordpress-github-sync icon indicating copy to clipboard operation
wordpress-github-sync copied to clipboard

Frequent silent failures after GitHub updates

Open dirkhh opened this issue 8 years ago • 5 comments

I have a reasonably complex, translated WordPress site (subsurface-divelog.org) which results in currently 290 files under _pages and _posts (github.com/Subsurface-divelog/subsurface-website)

Every time I push changes to the GitHub repo (or merge a pull request), GitHub calls the web hook and sends that new commit to WordPress. This commit contains the information which files have been changed. Yet, wpghs will traverse the whole tree and retrieve ALL files from GitHub. And because of the way git traverses the tree, the new changes tend to be the very last files to be "updated" (well, the first two hundred some odd calls to wp_update_post() actually don't result in any changes being saved - those files are all unchanged after all).

This would be just a waste of time and resources, if it wasn't for the fact that unfortunately for me nine out of ten times (at least) the update process dies without an error in my error log, before reaching the end of the tree. As you know from my earlier posts, I have the little filter that helps match files in the git repo with languages set in WPML - and for debugging I have that filter print the slug of the current post and the language detected to the error log. So I can see the progress of the sync unfold. And most of the times, the sync just stops somewhere in the middle (I would guess it's usually around 200 files into the sync, but haven't done a careful study of the number). GitHub then tells me that there was a timeout and that the web hook call wasn't successful (i.e., we didn't get to the end of the loop in save_posts where we return success to GitHub).

This of course leads to two different questions:

  1. why does the sync fail silently without any error visible in the Apache error.log? I tried turning on debugging, I followed some of the suggestions you can find online - but I'm anything but a PHP developer and so far have been drawing a blank

  2. why doesn't the import.php/payload() function look at the head commit that it gets as part of the payload argument and then only update those posts that are marked in there as modified? Right now all it does is grab the commit ID of the head commit and hand that off to import.php/commit() where we walk the whole tree. Or in case you fear that previous pushes may have been lost and therefore want to walk the whole tree, why doesn't it at least stage the modified files first in the $posts array, so that a) the updates are shown sooner (a successful sync of my site takes more than a minute) and more importantly (to me) b) even if the update for some reason fails (see (1) above), at least it fails while pointlessly updating unmodified posts instead of before it ever gets to the posts that actually have changed.

BTW: I can see that you might be tempted to discount (2) and focus on (1)... but I think that especially as larger, translated sites may easily contain thousands of posts, (2) becomes a real issue - it's just incredibly wasteful and slow to always walk the whole tree.

I know, this is a long and somewhat complicated / confused sounding bug report - please ask questions if any of this doesn't make sense. I have spent way too much time trying to track this down, but simply lack the PHP programming skills to fix the bug myself (not that I haven't tried).

Thanks

dirkhh avatar Feb 25 '17 18:02 dirkhh

The idea at the time was that a lot of this would be resolved out of the cache rather than actually making the API/DB calls. Do you have an object cache installed on this site? If you don't have an object cache, then it'll do the API call every time and likely timeout.

Feasibly, we can use the commit payloads to determine which posts to update. I may have run into issues at the time fetching the post that way out of the database, although now that I'm looking at it, I can't remember.

I also believed that WordPress wouldn't update a post that hadn't changed, but that may not be the case. We should be able to add a check to make sure a post isn't updated if it hasn't changed in the database at all, if that helps.

The idea of pausing and being able to pick up where we left off sounds intriguing, but I think we should be able to mitigate the issue without resorting to something that complex just yet.

mAAdhaTTah avatar Feb 25 '17 22:02 mAAdhaTTah

I don't have an object cache installed. Any suggestions / pointers where to start looking? WordPress performance hasn't really ever been a concern until now...

-------- Original Message -------- From: James DiGioia [email protected] Sent: Sat Feb 25 14:08:44 PST 2017 To: mAAdhaTTah/wordpress-github-sync [email protected] Cc: Dirk Hohndel [email protected], Author [email protected] Subject: Re: [mAAdhaTTah/wordpress-github-sync] Frequent silent failures after GitHub updates (#166)

The idea at the time was that a lot of this would be resolved out of the cache rather than actually making the API/DB calls. Do you have an object cache installed on this site? If you don't have an object cache, then it'll do the API call every time and likely timeout.

Feasibly, we can use the commit payloads to determine which posts to update. I may have run into issues at the time fetching the post that way out of the database, although now that I'm looking at it, I can't remember.

I also believed that WordPress wouldn't update a post that hadn't changed, but that may not be the case. We should be able to add a check to make sure a post isn't updated if it hasn't changed in the database at all, if that helps.

The idea of pausing and being able to pick up where we left off sounds intriguing, but I think we should be able to mitigate the issue without resorting to something that complex just yet.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/mAAdhaTTah/wordpress-github-sync/issues/166#issuecomment-282516060

dirkhh avatar Feb 25 '17 22:02 dirkhh

Batcache (WP plugin) + memcache (PHP extension + software).

mAAdhaTTah avatar Feb 25 '17 23:02 mAAdhaTTah

So much fun. I should just stop trying to use WordPress. Following the instructions (as they are) to install Batcache made my WordPress site simply stop; couldn't log in, didn't show a single page. No errors, nothing. I did test that memcached was installed and working (using telnet from the command line). removed the files from wp-content and thankfully all was back to normal. But I guess this isn't the direction that I will go.

dirkhh avatar Feb 26 '17 01:02 dirkhh

I just opened a pull request that implements what I suggested above - look at the payload and only update posts that are marked as modified.

dirkhh avatar Feb 28 '17 17:02 dirkhh