appstream
appstream copied to clipboard
New field for external ID
Hi, Bioinformatics programs can have several ID from different registries. These ID are used in scientific publications to identify software using a persistent identifier. For example, bedtools software have these ID: bedtools (from bio.tools), OMICS_01159 (from OMICtools), SCR_006646 (from SciCrunch). We are currently adding them into Debian packages in the debian/upstream/metadata file [1,2]. I wonder if it could be possible to do something more general and reusable by other Linux distributions by moving them from d/upstream/data to the AppStream file.
For bioinformatics pipeline development or deployment, it could be useful to know which software is already installed or not on the system. Moreover, it could be great to be able to install bioinformatics programs from software centers using these ID instead their names that could have sometimes name clash with other unrelated free software (ex.: snap, plink, eagle, ...).
So, can we specify these different ID in the AppStream metadata file of these software? Maybe with something like that:
<provides>
<registry type="bio.tools">bedtools</registry>
<registry type="OMICtools">OMICS_01159</registry>
<registry type="SciCrunch">SCR_006646</registry>
</provides>
Or something more general (not specific to bioinformatics) could be better with the definition of external ID into the AppStream metadata file?
Best regards, Dylan
[1] https://wiki.debian.org/UpstreamMetadata [2] https://salsa.debian.org/med-team/bedtools/blob/master/debian/upstream/metadata
As a general statement: I would rather not want to add any "official" method to provide another ID for a software component, because doing so would undermine the usage of the AppStream component-ID as the single unique identifier across distributions and package managers. People should query for the component ID instead of other IDs.
However, using the provides method to make components provide "registry names" does make sense to me and might be valuable to add. So, if we add something like this, it would definitely be in a <provides/> tag, as you proposed, and not as another toplevel tag. Afterall, you can use the provides tag today to even mention Python modules that the component provides, which falls into a similar category.
I am not sure yet whether registry is the best name for this new provided element (but at the moment I can't think of anything better - it certainly is generic enough. I don't like the registry name, because the software is not providing a registry, but instead providing the tool that is described by a particular ID in a registry. This could be a nitpick-y thing though).
By the way, you can store arbitrary data in AppStream metainfo files using a <custom/> tag, see https://github.com/ximion/appstream/blob/1a1701377f0b5d406b81aef170cf38cac3967105/tests/test-xmldata.c#L719-L726
That is useful for experimenting with new things.
So, tl;dr, I think we could have something like this as <provides/> element type, but we'll need to work out the details and how this feature should be used (ideally it should be generic enough and also desirable for other projects outside of Bioinformatics).
By the way, I remember there was a request for having citable references in AppStream a while back. I think having this would be neat, although we would not add the amount of tags that Debian's metadata has to AppStream. Instead, it would likely be a simple <bibliography/> tag with freetext entries and maybe a DOI link. But that's something for another issue report ;-)
Hi,
I understand your point of view for registry and indeed it's not easy to find a better name. But, I could suggest Catalog-entry, RRID or alt-id, are they better? I will ask to Steffen, who already thought about this for the debian/upstream/metadata file, if he has some nice suggestions.
Thanks for pointing me the <custom/> tag, I just used it to test this tag on one of my package.
It was my plan to open a bug for citable references ;-) but step by step, I didn't want to spam your bug tracker :-). But as you just suggested it to me, I will open a bug for this.
Best, Dylan
@Dybian That won't work ^^ The custom tag has a fixed form, so this should look like:
<custom>
<value key="registry::bio.tools">GWAMA</value>
<value key="registry::OMICtools">OMICS_00235</value>
<value type="registry::SciCrunch">SCR_006624</value>
</custom>
The custom tag is basically a key-value store. For some reason appstreamcli validates your file, while it really should not do that (I'll fix that today).
From the new suggestions, I think alt-id is the worst one :P
I would probably like to go with something like registry_name. @hughsie what's your opinion on adding these "alternative names from different other software registries" thing to AppStream? (And how should it be named?)
EDIT: The custom field is also not a child of the <provides/> tag bit of the root component node ;-)
I also think your metainfo file should have a type type="console-application" for its component, to show that this is describing an application that the user can execute from $PATH.
For more details on the custom tag, see also https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#tag-custom
I think it's fine to include, but I don't think custom is the right way to do this kind of thing. It's kinda like an SPDX license, it's a standardized reference of a known type. You could even have something simple like <registry_id type="SciCrunch">SCR_006624</registry_id> in the <component> node. I don't think it's a "provides" as it's not actually providing anything, it's just additional metadata about the application.
@hughsie I do not want any ID other than the component-ID on the top level for a component (yeah, it is at least in part a psychological thing) and we already created a precedent for having a component provide a specific ID in a <provides/> tag by having it provide alternative AppStream component-IDs via an <id/> tag as well as things like DBus service names and Python package names (the latter being slightly different from this case though in that they provide a file representing the ID in the filesystem).
Therefore, I think having this under provides is the right choice (it also means this can quite easily be queried using existing tools when implemented).
From a purely idealistic point of view, I think all the other registries should list the AppStream component ID to reference applications, instead of creating yet another identifier for software. However, having it in AppStream has two benefits:
- People might search for apps in a software center using the alternative name given in some publication
- It allows for easier cross-referencing of software for tools that don't yet use the component-ID
So adding it would be a pragmatic thing to do, and
providesis a natural fit.
I avoided calling this registry_id so far because technically the component-ID is a registry as well, and it is unclear what the definition of a "registry" actually is (why not repository_id etc.?) After seeing you use that name, it feels less awkward though and might be the objectively right name for this tag.
Btw, @hughsie my comments on "custom" were more to address the "we want to add
Is there still interest in this? I am thinking about combining this with https://github.com/ximion/appstream/issues/190 in a new references tag, so you would have something like:
<references>
<registry_id type="SciCrunch">SCR_006624</registry_id>
<doi>10.1000/182</doi>
<citation_url>https://example.org/CITATION.cff</citation_url>
</references>
What do you think?