goread icon indicating copy to clipboard operation
goread copied to clipboard

Feed updates are scheduled multiple times

Open cschramm opened this issue 11 years ago • 5 comments

It seems like feed updates are scheduled multiple times, since UpdateFeeds selects them with n <= Time.now() and then adds a task to the update-feed queue, although the same feed might already be scheduled.

This leads to feeds being updated multiple times, not just once. I'm often seeing the same feed in my logs three or four times with gaps of one minute.

cschramm avatar Aug 22 '13 08:08 cschramm

Feeds updating more than once is a known problem. There is a check to prevent a second update from happening until the next update time. However, there is a race condition there: a second update could start before the first has finished and marked the feed as updated. I haven't personally seen this, but it is possible. I need to rearchitect how the feed updater works to get rid of this problem.

maddyblue avatar Aug 22 '13 18:08 maddyblue

Are you sure this race condition is the only possible problem? My logs e. g. contain:

2013-08-22 21:32:56.670 /tasks/update-feed 200 924ms
D 2013-08-22 21:32:55.748 update feed http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:32:55.755 subscribe feed: http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:32:55.908 no rss feed date: http://kerbaldevteam.tumblr.com/
D 2013-08-22 21:32:56.129 hasUpdate: false, isFeedUpdated: false, storyDate: 2013-08-21 19:17:09.862962 +0000 UTC, stories: 20
D 2013-08-22 21:32:56.163 0 update stories
D 2013-08-22 21:32:56.163 putting 1 entities
I 2013-08-22 21:32:56.163 next update scheduled for 1h11m25s from now

2013-08-22 21:34:56.784 /tasks/update-feed 200 431ms
D 2013-08-22 21:34:56.356 update feed http://kerbaldevteam.tumblr.com/rss
W 2013-08-22 21:34:56.476 no rss feed date: http://kerbaldevteam.tumblr.com/
D 2013-08-22 21:34:56.727 hasUpdate: false, isFeedUpdated: false, storyDate: 2013-08-21 19:17:09.862962 +0000 UTC, stories: 20
D 2013-08-22 21:34:56.740 0 update stories
D 2013-08-22 21:34:56.740 putting 1 entities
I 2013-08-22 21:34:56.740 next update scheduled for 1h13m21s from now

If I get that correctly, both updates actually run and take less then a second and are two minutes apart.

cschramm avatar Aug 22 '13 20:08 cschramm

Your analysis is correct: there's some other problem. Not sure. The whole updater needs to be rethought. My current idea involve using backends to manage the list. This may work, but would be bad for self-hosters, since that's an additional instance and their hour count would exceed the free quota. So something else, I guess. Will think.

maddyblue avatar Aug 22 '13 20:08 maddyblue

I'm no familiar with AppEngine, but wouldn't it be possible to simply check if a feed is already in the update-feed queue before adding it again?

cschramm avatar Aug 22 '13 20:08 cschramm

Nope, queues aren't queryable that way. I believe the main problem is that multiple update processes are happening. There is code to protect against this, but apparently it's not working.

maddyblue avatar Aug 22 '13 20:08 maddyblue