goread
goread copied to clipboard
Feed updates are scheduled multiple times
It seems like feed updates are scheduled multiple times, since UpdateFeeds selects them with n <= Time.now() and then adds a task to the update-feed queue, although the same feed might already be scheduled.
This leads to feeds being updated multiple times, not just once. I'm often seeing the same feed in my logs three or four times with gaps of one minute.
Feeds updating more than once is a known problem. There is a check to prevent a second update from happening until the next update time. However, there is a race condition there: a second update could start before the first has finished and marked the feed as updated. I haven't personally seen this, but it is possible. I need to rearchitect how the feed updater works to get rid of this problem.
Are you sure this race condition is the only possible problem? My logs e. g. contain:
2013-08-22 21:32:56.670 /tasks/update-feed 200 924ms
D 2013-08-22 21:32:55.748 update feed http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:32:55.755 subscribe feed: http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:32:55.908 no rss feed date: http://kerbaldevteam.tumblr.com/
D 2013-08-22 21:32:56.129 hasUpdate: false, isFeedUpdated: false, storyDate: 2013-08-21 19:17:09.862962 +0000 UTC, stories: 20
D 2013-08-22 21:32:56.163 0 update stories
D 2013-08-22 21:32:56.163 putting 1 entities
I 2013-08-22 21:32:56.163 next update scheduled for 1h11m25s from now
2013-08-22 21:34:56.784 /tasks/update-feed 200 431ms
D 2013-08-22 21:34:56.356 update feed http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:34:56.476 no rss feed date: http://kerbaldevteam.tumblr.com/
D 2013-08-22 21:34:56.727 hasUpdate: false, isFeedUpdated: false, storyDate: 2013-08-21 19:17:09.862962 +0000 UTC, stories: 20
D 2013-08-22 21:34:56.740 0 update stories
D 2013-08-22 21:34:56.740 putting 1 entities
I 2013-08-22 21:34:56.740 next update scheduled for 1h13m21s from now
If I get that correctly, both updates actually run and take less then a second and are two minutes apart.
Your analysis is correct: there's some other problem. Not sure. The whole updater needs to be rethought. My current idea involve using backends to manage the list. This may work, but would be bad for self-hosters, since that's an additional instance and their hour count would exceed the free quota. So something else, I guess. Will think.
I'm no familiar with AppEngine, but wouldn't it be possible to simply check if a feed is already in the update-feed queue before adding it again?
Nope, queues aren't queryable that way. I believe the main problem is that multiple update processes are happening. There is code to protect against this, but apparently it's not working.