Use the changed mbids feed
http://ftp.musicbrainz.org/pub/musicbrainz/data/changed-mbids/
Hi Alex - before you dive straight in to using this raw feed, I'd like to talk clarify your needs. I'm really interested in not exposed these files, but instead backing them via a service. This will give MusicBrainz more freedom in the future and a better separation of concerns. I take it you're interested in all MBIDs that have changed each hour, in order to run some custom processing? If so, this sounds similar to what @stefansperber at Last.fm is interested in doing - so I'd be really interested in building an API that works well for both of you.
@ocharles: Thanks for reaching out, Oliver. I'm in no rush to implement this, please take as much time as you need to finalise the API.
That being said, what would work for me is a single resource with an optional parameter for the sequence number. Missing sequence number would return data for the last update. The data could be exactly the same as the contents of the current json files: the sequence number, the date and the list of artist MBIDs.
@ocharles Sounds great!
Ok, so it sounds like you would ideally like one request to catch up on all changes since some time/sequence identifier. I believe this matches Last.fm's wants. You mentioned only artist MBIDs - does this mean you aren't interested in other entity types?
More importantly though, how open are you to writing a library to the web interface? MBIDs are quite big so we can only really offer a 'go-back' service of around 4 hours if we don't filter the MBIDs, but this might be enough for you if you're polling every hour. If we use other techniques such as a Bloom filter, we could compress this down quite significantly. The downside is that this definitely requires people interact with the service through a library (which would basically implement querying the returned filter).
Let me know what you think.
@ocharles: I'm indeed only interested in artist MBIDs. It doesn't even have to be per hour, daily updates would work for me. I could also live with hourly updates for up to 4 hours.
Ok, I've talked with @stefansperber about this and think that for muspy and Last.fm the hourly packets are going to be the best source of data. However, we won't be giving you links to the files directly, but you'll interact with a separate service which will take a timestamp and gives you back the latest packet that includes that changes for that time (so you'll have to query with an increasing timestamp).
There will also be a service that takes a list of MBIDs and filters this down from a certain timestamp to now(), but this is targeted at media players/small libraries so probably doesn't apply here. Keep an eye on the blog for when we launch this - which hopefully won't be much longer!
@ocharles: Perfect, thanks!