appstream
appstream copied to clipboard
Specify a metadata format for external release descriptions
Hi! This PR contains the specification for a "new" metadata format that contains release descriptions. Having them split out of the main metainfo file allows them to be updated independently and separately, and even allows us to fetch updated release information automatically at catalog build time.
This PR follows the discussion at #354
Short summary of the changes:
- To use external release metadata, the main metainfo component must contain a
releases
tag with the propertyexternal="yes"
(note: maybetype="external"
is better / more consistent?). This is done so we don't need tostat()
around too much and to have the metadata author communicate their intent clearly (good for validation!). - External release metadata files follow the existing specification for AppStream releases, the
releases
tag is the root node of their XML. - External release metadata files must be installed as
/usr/share/metainfo/releases/%{cid}.releases.xml
(cid
being the component ID they belong to). - External releases may be downloaded from a HTTPS web URL set in the metainfo file's
releases
tag. - The external release data must be installed locally, even if it is incomplete. Only having the web URL is not permitted.
- Any
artifact
present in a release file downloaded from a remote source must not be trusted without further verification for now, until release metadata signing is implemented.
As you can see, I cut out the signing & verification issue for now, which, given that we only allow HTTPS links for release metadata, is probably also only a real problem in case of domain hijacking. That way we can make progress on this feature and add signing later (I have ideas as for how that may work too, but taking one step at a time feels like the better approach).
Feedback is very welcome before this gets implemented! (Especially from @pwithnall and @Pointedstick :-))
external="yes" (note: maybe type="external" is better / more consistent?).
I would suggest using type, so more types (e.g. automaticity import releases from other format) could be added in the future.
External release metadata files must be installed as /usr/share/metainfo/releases/%{cid}.releases.xml (cid being the component ID they belong to).
It is part of the AppStream specification, but as it is made to be used for other purposes, where Program,s may want to get a changelog, too, I would suggest a extra directory like share/releaseinfo.
Any artifact present in a release file downloaded from a remote source must not be trusted without further verification for now, until release metadata signing is implemented.
Is the artifact
tag actually used by anyone?
As you can see, I cut out the signing & verification issue for now, which, given that we only allow HTTPS links for release metadata, is probably also only a real problem in case of domain hijacking.
I personally think this is a little bit overengineering. The changelog of a program for software centers is not security critical. Nothing bad happens if a Domain would be hijacked and someone would provide another changelog, this would not do something else than trolling Users.
A big problem that I see is that the changelog is not mirrored. Screenshots are mirrored, so we can be sure that they are always available and the Screenshots are always the same file until next compose. But as I read, the purpose of the external release URL is that the changelog can be changed after release without a new version. So this would not be mirrored and and the software centers would access the URL each time someone is viewing the App. This brings a few problems:
- What happens if the URL disappear/changed?
- What about privacy? The Server owner could log every person who views the App with the IP. Especially under Linux, there are a lot people who really care about privacy, even if it's just the IP.
And what about the Flathub website. They need to download the file each time someone visits the App page. you can cache i for let's say a day, but the backbend needs to download and parse a file from some random URL. I'm not sure if the devs are a fan of this.
Another problem would be the backwards compatibility. First, everyone (including Flatpak) needs to switch to appstreamcli from appstream-glib. I'm not sure how much distros still use appstream-glib. This would also break the experience for Users which use older Distros. If AppStream adds a new feature it is usually a addition like the new internet
tag and will be just ignored by appstream-glib and older versions of libappstream. But this would be a breaking change for some existing and every time used feature.
It is part of the AppStream specification, but as it is made to be used for other purposes, where Program,s may want to get a changelog, too, I would suggest a extra directory like share/releaseinfo.
It's still very much part of the metainfo specification though, which projects should already have. I don't think scattering files across too many places makes sense, tbh...
Is the
artifact
tag actually used by anyone?
Yes, it's used a lot actually, there's a bunch of internal uses I know about but publicly fwupd and KDE make heavy use of it. Whether it's used or not, as long as it is a supported part of the specification we need to support it anyway.
I personally think this is a little bit overengineering. The changelog of a program for software centers is not security critical.
It may not just be a changelog though, it may also contain links to the actual program data.
A big problem that I see is that the changelog is not mirrored.
It is. This change only makes modifications to the metainfo spec, not to the catalog spec. That means for aggregated catalog data, a tool reading that data will never see an external changelog, it is always downloaded by the catalog metadata generator and embedded into the catalog data. That also ensures quite a great deal of backwards compatibility.
What happens if the URL disappear/changed?
A warning is emitted and the generator tool will likely fall back to the locally provided data (I could probably add some recommendations for good behavior to the spec, but I don't want to mandate what implementations do for that case - you could also argue that the component should be rejected entirely if the URL vanished).
What about privacy? The Server owner could log every person who views the App with the IP.
Not the case, software centers should never hit the original source unless they read the metainfo file directly. In every other case, a tool like appstream-generator will have take care of the task (of course, there's always Arch Linux which doesn't mirror anything, but for them the privacy issue already exists for screenshots and the problem could be trivially mitigated by only reading the locally provided data instead of the remote one. But that's a fringe case, for most people the catalog generator will have been the only thing that downloaded the metadata.)
Another problem would be the backwards compatibility. First, everyone (including Flatpak) needs to switch to appstreamcli from appstream-glib. I'm not sure how much distros still use appstream-glib.
AFAIK only Fedora and OpenSUSE. But yeah, it will take a while before projects can rely on this feature, same for when we introduced em
and code
highlights in descriptions and appstream-glib took forever to support it and appstream needed to percolate through all distros first. But that is precisely why I want to add this now, before AppStream 1.0 is released.
This would also break the experience for Users which use older Distros.
That would not be the case, the catalog format remains fully compatible. In some odd cases, distros may need to update the generator server-side, but most do that already anyway :-)
I like the idea and from KDE point of view I don't see any issue in implementing and we already provide rss feeds with the release information from the appstream data. See for example:
curl https://apps.kde.org/neochat/index.xml | xmllint --format -
That we will probably need to do is to copy this release feeds to a cdn so that we don't end up with ddos-ing of our poor servers once discover and co will fetch the metadata ;)
I am currently working on an implementation of this feature as described. Once merged, I will release AppStream 0.16 early next month, which will be the last series before the 1.0 release, and likely will receive support for quite a while. It will also include code to make the transition to AppStream 1.0 easier. The goal is to get 0.16.x into the next Debian release (and therefore Ubuntu and all the derivatives too), which means 0.16.x needs to be released before the soft freeze early next month.
So, any feedback on this feature is highly appreciated! Fortunately, with the more limited scope set for it, it actually is a lot easier to implement and much less disruptive than I originally assumed :-)
If there are no more objections or comments on this, I will merge the change this week (in a few days).
Thank you very much @pwithnall for the review! It's very appreciated :-) I addressed all comments, so I think this is safe to merge now. Once the code is in master, we'll see if anything breaks (hopefully not, the feature is written in a way that is fully backwards compatible and should be safe to use once the new libappstream has received some more widespread adoption. But of course it's also certainly one of the more invasive changes done on AppStream...)
Thanks to everyone for the feedback!