content
content copied to clipboard
Public Galaxy Server and Tool Metadata
Had a conversation with @matuskalas at 2 consecutive Galaxy CoFests about improving the search functionality of the Galaxy Platform Directory. In between those two conversations I had a conversation with my bosses about increasing the amount of information related to Galaxy in Bio.Tools and GalaxyCat.
It became obvious while talking with Matúš at this year’s CoFest that these goals complement each other nicely.
This issue could be created in many places:
- Bio.Tools, any number or repos (Picked Content
- GalaxyCat
- GalaxyProject, any number of repos (including galaxy, galaxy-hub,...)
Eventually there may be pull requests in many of those places (including ToolDog).
Goals
Increase presence of public Galaxy servers and their tools in Bio.Tools and GalaxyCat. Increase Awareness of Bio.Tools and GalaxyCat in the Galaxy Community. Simultaneously, make the Galaxy Platform Directory contain more useful and searchable information about those platforms.
How?
That’s what this issue is here to discuss. One item seems uncontroversial to me:
- We should use ontology terms to do this. EDAM and a Taxa ontology seem most useful, but others also have obvious applications. For example, RepeatExplorer is all about repetitive elements, and that suggests the sequence ontology
And a starting smattering of open questions:
- Which ontologies? Any ontology that is available in a lookup service, or only a core set of ontologies?
- How do we add new ontologies, either to our limited list, or that aren’t in our selected lookup service? For example the Climate Workbench server may use ontologies that aren’t in biology-centric lookup services.
- Where and how do we store and access server ontology information? On the server itself, seems like a good idea, but adding this to metadata in the hub might be a good fallback.
- A fair amount of work (I think) has gone into supporting EDAM annotation of individual tools. How can we encourage tool wrappers to actually use this functionality?
- How do we make the Galaxy, and larger bioinformatics communities aware of these resources, once they are updated?
Ping @matuskalas @hmenager @dannon @hexylena @bgruening @khillion @julozi
Increase presence of public Galaxy servers and their tools in Bio.Tools and GalaxyCat. Increase Awareness of Bio.Tools and GalaxyCat in the Galaxy Community.
I think the only way to increase awareness is embed it in Galaxy somehow, or no one will find this unrelated website without a lot of work. Maybe at the bottom of search results like this:
and just link to GalaxyCat (not search by default!) (And also update their data...)
How can we encourage tool wrappers to actually use this functionality?
I think Galaxy needs to do more there. E.g. showing the ontologies somewhere in the UI, allowing searching on inputs/outputs of those ontologies, etc.
This is very interesting. Publishing the metadata on the servers themselves seems natural, but then my question is how the Galaxy Platform Directory finds out which servers to check in the first place without continuing to store its own list of servers.
Some thoughts on this discussed in the GCC2021 CoFest:
- Add bio.tools IDs and ontology concept IDs into the Galaxy server "metadata"
- Have these accessible via something like
<galaxyServerUrl>/api/about
, together with Galaxy version etc. - Create an issue on galaxyproject about this
How this data will be integrated and viewed elsewhere:
- I suppose the server listing at the galaxyproject.org community website would want to show these, and so would GalaxyCat and bio.tools
- Galactic Rediotelescope could perhaps help gathering the server data
- GalaxyCat a read-only view(?)
- 2 options to proliferate the data:
- Galaxy (Radiotelescope) -> Tools Ecosystem -> GalaxyCat
- Galaxy (Radiotelescope) -> GalaxyCat -> Tool Ecosystem
- Later in the future, we might consider if we also want to sync updates of the data from the Tools Ecosystem into Galaxy
GRT isn't the right route (that's opt-in, and very few will be part of that) there's a better one, the public server list (+scraper)! https://github.com/martenson/public-galaxy-servers/
We use that script on a cron job which pulls in the /api/configuration route from like 100 servers on a regular basis.
We had a page here that showed stats collected from all public galaxy servers, I'll fix it on monday. https://stats.galaxyproject.eu/d/000000020/public-galaxy-servers?orgId=1&from=now-7d&to=now