discogs-xml2db
discogs-xml2db copied to clipboard
Updating a PostgreSQL database
Is there a mechanism in place for updating a PostgreSQL database from the latest XML dumps? I couldn't find anything other than what the README mentioned about MongoDB.
Sadly no. The Discogs project doesn't give deltas to be able to easily compute updates.
However there are two type of changes between datadumps:
- Existing releases/artists etc modified
- New releases/artists added
Because releaseids, artistids etc are sequential if you were only interested in -2 you could just say insert releases whose id is greater then the current id in the database , and so on for artists etc.
If you were interested in -1 as well if you had the the previous dump files there could be the possibility to compare each record in one dump with another and only update (or insert delete) if different.
So I think there is the possibility to do something
I like the id idea, even if it introduces a higher complexity. Alternatively, I guess we could compute a checksum for every top-level xml record and re-process those that have changed.
However there are two type of changes between datadumps:
- Existing releases/artists etc modified
- New releases/artists added
There is also releases/artists removed or merged.
When the records get removed or merged are they actually removed from the dump or just their status gets marked somehow?