appstream icon indicating copy to clipboard operation
appstream copied to clipboard

External release description format for AppStream

Open ximion opened this issue 2 years ago • 11 comments

This is a proposal for splitting the release information out of the main MetaInfo files for applications, to allow them to be updated independently. This, if implemented, would be a major new feature and break a few assumptions, so it is AppStream 1.0 material even if implemented earlier.

Goals

The primary goal of optionally splitting out release metadata for projects that wish to do so is addressing issues like https://github.com/ximion/appstream/issues/240 where projects want to modify the release metadata for a bit of time after a release was made. It would also reduce the amount of content in a MetaInfo file significantly and possibly make the file a bit easier to handle and update.

We would also gain a very cheap way to check for software updates in a machine-readable way (that doesn't involve processing HTML pages).

Proposal

A new AppStream XML file format will be created, containing the <releases/> block from the existing MetaInfo specification. These files must be installed by the software into /usr/share/metainfo/releases/%{id}.releases.xml where %{id} is the component-ID of the application that the release information belongs to. Contents of this file may look like this:

<releases>
  <release type="stable" version="0.14.5" date="2021-08-28">
    <description>
      <p>This release adds features</p>
    </description>
  </release>
  <release type="stable" version="0.14.4" date="2021-06-22">
  </release>
  ...
</releases>

(refer to the releases tag specification for all details on what is possible)

If this release metadata file is split out from the main MetaInfo file, the MetaInfo file can contain an empty releases block, containing an URL that points to a web location where the continuously-updated release metadata lives:

<component type="generic">
  <id>org.example.app</id>
  ...
  <releases url="https://example.org/repo/org.example.app.releases.xml" />
</component>

If a metadata processing application encounters a MetaInfo file, it will do the following things in order to get release information:

  • If MetaInfo file contains a full filled-out <releases/> block it will be unconditionally used and preferred above all else
  • If no <releases/> block is present, look for a /usr/share/metainfo/releases/%{id}.releases.xml file and use that to get release metadata
  • If the releases tag has a url property, fetch the release information from the web URL and embed the new data in the generated output. If we can not fetch data from the URL, emit a warning and use the local data

This means that we will always have somewhat-usable release metadata (even if the URL goes down, we will always have the version number and release data, and possibly untranslated release descriptions). If a release is processed by the collection metadata generator immediately after it was done, before people managing the release data at the remote URL had time to translate everything and add data, we will of course run into the current situation of having incomplete release descriptions. Except this time, with an URL present that can be queried for new releases, an AppStream metadata generator can poll that URL periodically and incorporate any changes made (distributors probably only want to add metadata up to the version they offer in their archive).

Open Questions

Besides the general "Is this a good Idea?" there's a few details questions already:

  • Should the MetaInfo file explicitly mention that a release metadata file was installed externally? That would save us one stat() in a directory to check for existence of the file, and would follow the "explicit is better than implicit" rule AppStream has, but it would also be a lot clunkier.
  • Do we need to include a signature, so we can verify the authenticity of the by-URL referenced release file? A release file can contain downloadable artifacts, which do contain checksums, but there is no way for us to determine whether the data served there is actually legit or whether someone did get a hold of the domain and just swapped out the metadata file. Of course, this may be an artificial scenario, and won't affect Linux distributions (they split our artifacts), but it is something to consider.

Obviously, appstreamcli would gain support for reading and validating the new file format, and this would be an optional feature, so everything we currently have would remain working and does not need to change.

CC: @hughsie @pwithnall @Pointedstick

ximion avatar Sep 09 '21 00:09 ximion

Thanks for digging into this. Here are some of my thoughts, hopefully in a bit of a coherent order.

I think @wjt may also be interested in this topic, at least to follow along with.

If this release metadata file is split out from the main MetaInfo file, the MetaInfo file can contain an empty releases block, containing an URL that points to a web location where the continuously-updated release metadata lives:

I think that’s going to cause problems for gnome-software: we rely on being able to query the first releases/release/@timestamp or releases/release/@date to get the latest release date for a component. The release date is used all over gnome-software to sort app lists by latest release and prioritise latest releases.

Would it be possible to require that at least the date or timestamp and version attributes for each release are present in the metainfo file? i.e. Change this proposal from “all release information can be split out into a release file” to “translatable release information can be split out into a release file”?


Going further than that, gnome-software currently displays the release notes from the latest release on the details page for an app. If you click a button, a dialog opens which shows the historic release notes too. If apps split out their release information, this means a HTTP request is required every time the details page is opened, in order to get the release notes for the latest release.

That’s not terrible, but perhaps we could do better by allowing the human readable release notes, and any translations which are already available, to be shipped in the metainfo file, and have additional translations in the release file. These would be merged in to the XML in the metainfo file.

If so, the sequence of steps to get release information would become:

  • If no <releases/> block is present, or no <release/> is present for the version being queried, do nothing. There is no release data for that release.
  • If a <release/> is present for the version being queried, and it contains a <description/> in the user’s current locale, use that.
  • If a <release/> is present for the version being queried, but it contains no <description/> in the user’s current locale, look for a /usr/share/metainfo/releases/%{id}.releases.xml file and use that.
  • If %{id}.releases.xml doesn’t exist or doesn’t have an entry or translation for the requested version and locale, fetch the release information from the web URL and store it then re-query it.

Did you have any thoughts about caching? I just realised that if gnome-software fetches the release file, it won’t have permission to store it in /usr/share/metainfo/releases, as it runs as the user. So the release data would have to be cached in ~/.cache somewhere, and that location checked (in step 3 above) before /usr/share/metainfo/releases is. It would also be good to have some well-defined criteria for when to consider that cache stale and re-query the release file from the web. Otherwise any consumer of the data will have to make a HEAD request every time they use it, which is almost as bad as having to re-download it every time.

* Do we need to include a signature, so we can verify the authenticity of the by-URL referenced release file? …

I think this is quite a big can of worms, and will not be entirely straightforward to get right from a security point of view while also keeping it easy to use for app maintainers.

You can use a simple (non-cryptographic) hash of the release file and include the hash value in the metainfo file — but then the metainfo file needs updating whenever the release file changes, which undoes the whole point of this proposal.

Alternatively, you need to ship a public key in the metainfo file, and have the release file (whatever its latest content is) signed with the private half of that key (which would not be shipped in the metainfo file, and would be kept private to the app’s maintainers). Doing this precludes the release file being modified by anyone else (such as a distro, which might be a legitimate use case). It also precludes the release file being concatenated together with other release files by a distro, which might also otherwise be a legitimate thing to do.


Finally, it might be worth looking at any work that people have done on this topic for flatpak. I don’t know what the current state of that is (might be fully implemented, might just be a few initial discussions), but it would be good to make sure this doesn’t become mutually orthogonal with whatever’s happening there.

pwithnall avatar Sep 09 '21 09:09 pwithnall

[...]

If this release metadata file is split out from the main MetaInfo file, the MetaInfo file can contain an empty releases block, containing an URL that points to a web location where the continuously-updated release metadata lives:

I think that’s going to cause problems for gnome-software: we rely on being able to query the first releases/release/@timestamp or releases/release/@date to get the latest release date for a component. The release date is used all over gnome-software to sort app lists by latest release and prioritise latest releases.

Would it be possible to require that at least the date or timestamp and version attributes for each release are present in the metainfo file? i.e. Change this proposal from “all release information can be split out into a release file” to “translatable release information can be split out into a release file”?

It is important to note here that I did not suggest to exclude the release metadata from collection XML files (so, stuff in app-info), but only from MetaInfo files. This means that the softwre center will actually never be forced to query that URL, as either the data is available in full as collection XML or as a metainfo file, in which case a release metadata file in /usr/share/metainfo/releases should exist that GNOME Software can read. Ideally it wouldn't even need to worry about that though, as libappstream could probably abstract all of these details away.

You did give me an idea though: With the collection XML, people currently download all the metadata for everything they could possibly install, which on Debian is quite a lot. Downloading release data for apps that will never be installed feels a bit wasteful, so I think we could augment this proposal to actually split out the release blocks from collection XML as well and only leave "naked" release entries in there, so date and version are present, but not the "full" release data. The full release data could be stored on the distribution's or software source website and then be downloaded on-demand if the user actually wants to view changes.

Going further than that, gnome-software currently displays the release notes from the latest release on the details page for an app. If you click a button, a dialog opens which shows the historic release notes too. If apps split out their release information, this means a HTTP request is required every time the details page is opened, in order to get the release notes for the latest release.

That wouldn't be needed with the design outlined above, as the release data would still be available in full in the collection data, or locally in the metainfo dir. It would however be needed if we were to split the data from collection XML as well and the software center wants to display more than just the latest version on the details page.

That’s not terrible, but perhaps we could do better by allowing the human readable release notes, and any translations which are already available, to be shipped in the metainfo file, and have additional translations in the release file. These would be merged in to the XML in the metainfo file.

If so, the sequence of steps to get release information would become:

* If no `<releases/>` block is present, or no `<release/>` is present for the version being queried, do nothing. There is no release data for that release.

* If a `<release/>` is present for the version being queried, and it contains a `<description/>` in the user’s current locale, use that.

This would prevent translation updates for taking effect. What do we do for partial translations?

* If a `<release/>` is present for the version being queried, but it contains no `<description/>` in the user’s current locale, look for a `/usr/share/metainfo/releases/%{id}.releases.xml` file and use that.

Couldn't we just do that immediately, or do you think that would slow things down a lot? (I can't imagine that being the case, but I have been surprised in the past)

* If `%{id}.releases.xml` doesn’t exist or doesn’t have an entry or translation for the requested version and locale, fetch the release information from the web URL and store it then re-query it.

Ideally the software source would provide complete data, so the only reason for the software center to ever hit the remote URL is for locally installed metainfo data, or in case the software source decided to not embed or cache the data.

Did you have any thoughts about caching? I just realised that if gnome-software fetches the release file, it won’t have permission to store it in /usr/share/metainfo/releases, as it runs as the user. So the release data would have to be cached in ~/.cache somewhere, and that location checked (in step 3 above) before /usr/share/metainfo/releases is. It would also be good to have some well-defined criteria for when to consider that cache stale and re-query the release file from the web. Otherwise any consumer of the data will have to make a HEAD request every time they use it, which is almost as bad as having to re-download it every time.

If the software has release data, it must be in the Metainfo file or it must have a file in /usr/share/metainfo/releases, so some release info would always be present. I would like for libappstream to abstract away all the on-demand downloading and caching, so we could implement a rule like "always check for updates if we don't have a translation in the user's current language, otherwise try to re-download the file if its older than a day" for caching.

* Do we need to include a signature, so we can verify the authenticity of the by-URL referenced release file? …

I think this is quite a big can of worms, and will not be entirely straightforward to get right from a security point of view while also keeping it easy to use for app maintainers.

You can use a simple (non-cryptographic) hash of the release file and include the hash value in the metainfo file — but then the metainfo file needs updating whenever the release file changes, which undoes the whole point of this proposal.

Yes, that is not an option.

Alternatively, you need to ship a public key in the metainfo file, and have the release file (whatever its latest content is) signed with the private half of that key (which would not be shipped in the metainfo file, and would be kept private to the app’s maintainers). Doing this precludes the release file being modified by anyone else (such as a distro, which might be a legitimate use case). It also precludes the release file being concatenated together with other release files by a distro, which might also otherwise be a legitimate thing to do.

Not necessarily. The distro could check upstream's signature and then re-sign the file with its own key to make trusted modifications to it. I think that would be entirely fine. Embedding signatures will be a bit ugly though, I wonder if something like EdDSA could be used here, so we avoid the hell that is interfacing with GnuPG.

Finally, it might be worth looking at any work that people have done on this topic for flatpak. I don’t know what the current state of that is (might be fully implemented, might just be a few initial discussions), but it would be good to make sure this doesn’t become mutually orthogonal with whatever’s happening there.

Flatpak has a nice chain of trust going, but it also works a bit like distro package managers, so has a central data source. For AppStream, we would need per-file data verification. It really doesn't hurt to ask for feedback though, and this is an extremely hard problem to solve. We kind of already have it for screenshots, where we just assume that HTTPS works and nobody swaps out screenshots for something bad - so far, there have been no issues, which is why I wondered whether just using HTTPS is enough security for the release data as well. But tbh, in this day and age and with releases containing artifact references, we probably need something better and can't ignore this completely. In 10 years of AppStream I also learned that this is going to be used by many people in sometimes unexpected ways once it is widely available, so knowingly adding a security trap may bite us in the long run. Looking at what fwupd does for authenticity checks may actually also be helpful for reference here! (I'm pretty sure the metadata is just signed by LVFS as-a-whole, but who knows, maybe there's something to learn here)

ximion avatar Sep 09 '21 14:09 ximion

So, with regards to fwupd, that uses JCAT https://github.com/hughsie/libjcat for all its signing needs, to support multiple signatures on multiple files.

ximion avatar Sep 09 '21 15:09 ximion

cc @tsdgeos, whoc was also involved with the original proposal (https://github.com/ximion/appstream/issues/240).

Pointedstick avatar Sep 10 '21 19:09 Pointedstick

Honestly i don't know enough about the appstream machinery to be able to comment on whether if this architecture is a good idea or not. I just know that the possibility of having external release files would be good for how we do the releases in KDE, there's not much i think i can contribute besides the "wish request".

Sorry :/

tsdgeos avatar Sep 10 '21 21:09 tsdgeos

Yeah, I also have no opinions about the implementation as long as the feature works for us. :)

Pointedstick avatar Sep 12 '21 17:09 Pointedstick

It is important to note here that I did not suggest to exclude the release metadata from collection XML files (so, stuff in app-info), but only from MetaInfo files. This means that the softwre center will actually never be forced to query that URL, as either the data is available in full as collection XML or as a metainfo file, in which case a release metadata file in /usr/share/metainfo/releases should exist that GNOME Software can read. Ideally it wouldn't even need to worry about that though, as libappstream could probably abstract all of these details away.

Apologies, I missed that. That sounds fine to me then.

You did give me an idea though: With the collection XML, people currently download all the metadata for everything they could possibly install, which on Debian is quite a lot. Downloading release data for apps that will never be installed feels a bit wasteful, so I think we could augment this proposal to actually split out the release blocks from collection XML as well and only leave "naked" release entries in there, so date and version are present, but not the "full" release data. The full release data could be stored on the distribution's or software source website and then be downloaded on-demand if the user actually wants to view changes.

Please keep the release entry for the most recent release of each component in the collection XML, as per my previous comment. Otherwise gnome-software will end up downloading all the version history anyway just to get the release notes from the most recent release.

That’s not terrible, but perhaps we could do better by allowing the human readable release notes, and any translations which are already available, to be shipped in the metainfo file, and have additional translations in the release file. These would be merged in to the XML in the metainfo file. If so, the sequence of steps to get release information would become:

* If no `<releases/>` block is present, or no `<release/>` is present for the version being queried, do nothing. There is no release data for that release.

* If a `<release/>` is present for the version being queried, and it contains a `<description/>` in the user’s current locale, use that.

This would prevent translation updates for taking effect. What do we do for partial translations?

How commonly is a particular translation of some release notes updated after it’s initially translated? The release notes themselves don’t get updated after release time (do they?), and I would have thought that typically the translators would translate the notes basically atomically, i.e. they’re either 100% translated to a given locale, or 0%.

* If a `<release/>` is present for the version being queried, but it contains no `<description/>` in the user’s current locale, look for a `/usr/share/metainfo/releases/%{id}.releases.xml` file and use that.

Couldn't we just do that immediately, or do you think that would slow things down a lot? (I can't imagine that being the case, but I have been surprised in the past)

Should be alright to do immediately if (and only if) the <description> is actually going to be used by the client. Otherwise this would cause N stat() calls for N apps when gnome-software is listing apps. That starts to take a non-trivial amount of time.

Did you have any thoughts about caching? I just realised that if gnome-software fetches the release file, it won’t have permission to store it in /usr/share/metainfo/releases, as it runs as the user. So the release data would have to be cached in ~/.cache somewhere, and that location checked (in step 3 above) before /usr/share/metainfo/releases is. It would also be good to have some well-defined criteria for when to consider that cache stale and re-query the release file from the web. Otherwise any consumer of the data will have to make a HEAD request every time they use it, which is almost as bad as having to re-download it every time.

If the software has release data, it must be in the Metainfo file or it must have a file in /usr/share/metainfo/releases, so some release info would always be present. I would like for libappstream to abstract away all the on-demand downloading and caching, so we could implement a rule like "always check for updates if we don't have a translation in the user's current language, otherwise try to re-download the file if its older than a day" for caching.

Right, but how do you abstract the on-demand downloading away such that it has permission to write to a directory in /usr/share?

pwithnall avatar Sep 13 '21 11:09 pwithnall

@pwithnall

Please keep the release entry for the most recent release of each component in the collection XML, as per my previous comment. Otherwise gnome-software will end up downloading all the version history anyway just to get the release notes from the most recent release.

If appstreamcli has good tooling for this, we could make it so that the metainfo files always contains "stub data", and the complete information can be downloaded on-demand. That would actually be pretty neat, but needs good tooling around it, as keeping things in sync manually would be hell.

How commonly is a particular translation of some release notes updated after it’s initially translated? The release notes themselves don’t get updated after release time (do they?),

Oh yes! At least Krita and GIMP apparently do that. And unfortunately translations are also not atomic, translations happen in blocks. We had the dumb situation a while back when AppStream was showing just "Features" translated into $language for many components, simply because that string was the only translated paragraph and AppStream thought "oh, there's a translation into the user's language, let's use that!". Now AppStream has some fairly complicated logic to only display localization if enough of a description text is actually localized (I think it's something like more than half of all paragraphs).

Should be alright to do immediately if (and only if) the is actually going to be used by the client. Otherwise this would cause N stat() calls for N apps when gnome-software is listing apps. That starts to take a non-trivial amount of time.

Fair enough - if we allow that stub data in MetaInfo files, there wouldn't be a need to stat() everything, I think that would fit all usecases well. So it would be:

  • Stub <release/> notes present in MetaInfo files
  • Full release notes (as much as known) installed into /usr/share/metainfo/releases
  • Up-to-date recent release information downloadable from URL specified in MetaInfo file

Right, but how do you abstract the on-demand downloading away such that it has permission to write to a directory in /usr/share?

Oh, it would never write to that location - we would just make AsComponent download the missing data if get_releases() or any related function is called. There would of course be a flag to prevent any downloading, if users don't want that for privacy reasons. And it's not unlikely that distributions or repositories like Flathub would cache the release files on their servers, for no privacy leak and to use their CDNs for faster data delivery.

The thing that actually causes me the most headaches is the signing part. A signature is needed to verify that the release information that was downloaded actually belongs to the application when the distribution downloads it from an arbitrary source. When the final data is shipped via distribution sources, HTTPS should do the job and there would just be one URL to fetch data from, so about that I am not worried. But a random metainfo file shipped in a tarball needs a signature, so nobody along the way manipulated the release URL inside to point at a malicious release information file. The dumb thing is, that if the MetaInfo file contains a public key and the remote data is signed, someone who wanted to mess with it could swap out both the key and the URL which would make the verification pretty useless for that. The only thing that the signature would protect against then is if someone took over the URL under which the release file was stored and replaced the file there. And I don't know how likely that issue is.

ximion avatar Dec 07 '21 21:12 ximion

Since this is on the list of essential things for AppStream 1.0, I am thinking about implementing this feature by:

  • Not adding extra metainfo signing
  • Enforcing HTTPS for external release URLs
  • Enforcing stub release information in case an external release URL is present

This will not protect us against someone taking over a domain and replacing the release info file, but if that is a realistic issue we could always add signing later.

ximion avatar Jun 13 '22 19:06 ximion

My 2¢: maybe I come at this feature from a different direction than the others, but the thing that I'm really interested in here is the local file case.

The cockpit project doesn't make any particular special commit when we release. We just tag off, and we're done. For that reason, I want to be able to store the release history downstream (ie: in Flathub) and a metainfo file with an empty <releases/> section upstream. The release history file would be fed into flatpak during the build process and end up in the final app.

This could also be done by merging the history into the metainfo file, but it's difficult, and the one tool that might be useful there (appstreamcli news-to-metainfo) isn't available in the GNOME SDK. I was sort of hoping to avoid having to create my own tooling here.

allisonkarlitskaya avatar Jun 15 '22 12:06 allisonkarlitskaya

(appstreamcli news-to-metainfo) isn't available in the GNOME SDK

appstreamcli landet in the Freedesktop SDK 22.08 which will be released in August. All other SDKs are based on Freedesktop SDK, so appstreamcli will soon be part of it.

JakobDev avatar Jun 15 '22 12:06 JakobDev