JSONFeed icon indicating copy to clipboard operation
JSONFeed copied to clipboard

Unique ID for feed

Open francosolerio opened this issue 8 years ago • 4 comments

Have you considered adding a unique ID field to the feed? The spec comments on the problems occurring when items have no ids, but errors happen when adopting the feed url as unique id too. Feed urls often change, either via permanent redirects or itunes:new-feed-url tag, and the same url can often be written with or without a trailing slash. This can create problems i.e. when syncing clients on multiple devices.

Take for example John Gruber's The Talk Show. If you search the iTunes API for The Talk Show, it returns the following feed url: https://daringfireball.net//feeds/serve?feed=thetalkshow Then you download and parse the feed, and there both atom:link and new_feed_url report a different feed url: https://daringfireball.net//feeds/serve?feed=thetalkshow

I don't know why iTunes API doesn't parse new_feed_url and update its record, but that's the way it is.

francosolerio avatar May 17 '17 21:05 francosolerio

Thanks @francosolerio. Is there a case where JSON Feed's feed_url won't solve this? I think in most cases it really should be equivalent to a unique ID for the feed.

manton avatar May 18 '17 15:05 manton

Hello @manton, thank you for your answer. Here is my experience in syncing a podcast client on multiple devices to cloudkit.

  • Device A subscribes to feed with URL X.

  • Device B subscribes to the same feed with URL X.

  • User activates cloud syncing on both devices.

  • We must rely on some unique ID from the original feed, otherwise we will have a duplicate subscription, so we use feed URL.

  • Feed is migrated to a new URL Y.

  • Device A updates feed, updates its local record to new feedURL Y and pushes changed feedURL Y to corresponding record on the cloud.

  • Feed goes offline for some reason.

  • Device B start updating from network.

  • Feed is unreachable.

  • Cloud is reachable.

  • Device B syncs to cloud, sees a record with feedURL Y for which it has no corresponding local record and creates one.

  • Now device B has 2 local records for the same feed, one with URL X, the other with URL Y.

  • We have a duplicate record.

  • Feed comes back online.

  • Device B updates, and changes record for URL X to URL Y.

  • Now both duplicate records have URL Y, but they still are duplicate.

Sync is hard and I'm no expert, this is my first project featuring sync (I read all of Brent's Vesper Sync Diaries :) ). I can put some kind of de-duplication algorithm in place, but I thought if all devices could rely on a FEED_UID that never changes, syncing would be much easier.

francosolerio avatar May 18 '17 20:05 francosolerio

Here's what I'm trying to understand: if a feed moves (new server or even blog platform), does the feed ID have any chance of also staying the same? In practice I'm not sure it's going to be more reliable than simply feed_url. Thanks for the example case that you've run into, though!

manton avatar May 21 '17 14:05 manton

@manton you are right, if we have to consider big blogging / podcasting platforms it's probable that migrating to a new platform would generate a new ID. There is not much we can do for these cases, apart from eventually recommending in the specification to conserve old ID when migrating to a new platform.

But there are a lot of self published feeds, and people often restructure their websites moving the feed file to a new url. Or they publish the same feed url on different places (own website, iTunes...) with or without the final slash.

An ID should never change. A feed url can have more than one representation pointing to the same file (http/https, with/without final slash, redirects...). IMHO this makes the two incompatible.

francosolerio avatar May 22 '17 13:05 francosolerio