Improve updates delivery workflow for consumers of MTGJSON data
As a third party app developer I want to know when some data has been updated, so that I can download it and present the updated data to the user.
The problems I currently see:
- the data seems to be forcefully regenerated each day (even if no change was made I suppose)
- the data contains a
metafield that contains thedateof generation, as well as aversionfield with the date also contained in the metadata part of the version from a SemVer point of view ("5.2.2+20251007"). This version is the one of the MTGJSON tool, not a version useful to know if the data exposed has really changed. However the presence of an always changingmetafield is enough to change the SHA256 of the file. - the (much smaller) SHA256 file can't be used by a third party app to detect a change in the json file, because of the presence of the
metafield in the json. - the only advertised way to get small updates is a paying service, MTGGraphQL. Even by using it, since the data itself is not versioned (i.e. the same data would not have a stable version), this wouldn't help on the "when to update" problem.
- HTTP caching wouldn't really help either as long as metadata and data are mixed.
How could this be fixed?
I'm probably unaware of all the constraints you face, but my initial thought would be to remove the metadata fields and put them in external manifest files, right with the sha256 in them. This way:
- the third party app can dowload small manifest files with all the required metadata.
- it can compare the sha256 to what it has locally, and download only the files for which the sha256 has changed.
- it can present updated data faster to the user (after bannings for example)
- it opens the door to more frequent generation, instead of once a day (removes latency for spoilers, bannings)
- it decreases the pressure on MTGJSON servers (since caching is almost useless, files need to be downloaded frequently)
This isn't really helpful if the data keeps changing, which is why you handle the prices separately. After doing a diff of some data (AllSetFiles), what I see is that the main source of changes is the EDHREC data (edhrecRank changes for lots of cards, but edhrecSaltiness looks stable). This could be factored out too, the same way prices are.
I'd be happy to hear about your views on this problem, maybe I overlooked a simple solution?
Hey there! You've identified a number of potential weaknesses with our current offerings. There are definitely improvements to be made, but we have to come up with a strong plan before going forward with them.
While I have closed #634, the idea is still on the table as something we could consider offering (effectively a daily diff file) -- but that comes with its own baggage on if you miss a day, you'll miss out on the entire change set and not be up to date.
edhrecRank is definitely the most volatile field we have, and it could make sense to siphon it off into its own file to avoid the large diffs. However, removing a field from service would be a breaking change and would require a major version bump. The best I could consider doing is 0'ing it out and letting people know in a subsequent release to follow up with a different file (which isn't off the table). I've attempted to limit the number of breaking or pseudo-breaking changes for a long while now to keep things stable.
If you'd like to assist in our adventures, or have additional ideas to offer, we're always listening to feedback and try to incorporate that into our offerings.
Hi, thanks for your quick reply! Let's break it down.
Since you talk about it, the daily diff has some pros and cons: Pros:
- fits the usage pattern of the users that update data every single day
- minimal changes in size Cons:
- doesn't really help the more casual consumers, those that run a third party app once in a while, yet expect the freshest data at startup
- requires the server to keep a history of all deltas, not just one, or you're back to square 1 if you check for updates every other day
- harder to implement both on your side and on third party side
- doesn't really improve the scenario of very volatile data (e.g. you have diff for days
d,d+1,d+2. If you update on dayd, skipd+1and update again ond+2, you will still need to downoadd+1data, even if it's now then outdated (like for example old prices, or oldedhrecRankchanges)
So while it might look like a good idea for some use cases, this solution doesn't seem relevant in the short term to me. There are low-hanging fruits that could improve the situation with some best practices in the shorter term.
While I understand you want to keep things stable, you're also shooting yourself on the foot. The last official release is 2 years old, and while you seem to use SemVer, you're absolutely not leveraging it. The whole point of having a release version scheme is... well to do releases. Instead of not moving from fear of breaking stuff, have a plan to guide users on the changes, and change often but small things at a time, especialy the non breaking stuff, and document it so third parties have a clear migration path. If you fear changing anything, no work can be done, so the release process needs to be fixed first, to enable the changes.
As an example, the edhrecRank could indeed be factorized out in a separate table, yet you could provide a migration path by filling both (legacy + new fields) during a transition period. Third parties could then have enough time to adapt to the new table, and once that transition period (a few months) is over, you just deprecate the old stuff and fill the old data with zeroes. This will trigger a bug for those that didn't update and that were late to the party. Then remove the deprecated field in the subsequent major release.
I joined the discord to sort things out since there's a lot to unpack about where you want to go and how you want to achieve these goals.