planemo
planemo copied to clipboard
Integrating workflow execution projects into planemo?
Over the last few months I've spent some time working on a command line tool for executing Galaxy workflows located here: https://github.com/simonbray/gxwf/. The idea is basically to provide convenient, user-friendly access to Galaxy workflows / datasets / invocations via a command line interface. Here are an overview of the main commands available:
-
manage
- add, remove and switch between different sets of API credentials -
list
workflows available to the user, either their own or those published by others -
invoke
a workflow (optionally with an interactive prompt rather than the usual yaml file) - subcommands to list all datasets, workflows and invocations (optionally filtered by a search term)
-
alias
which generates Docker-style aliases for Galaxy objects usingnames_generator
which can be used for running jobs / workflows instead of IDs.
@bgruening pointed out that there are multiple similar projects - for example the planemo run
command and the galaxy-workflow-executor
created at the EBI - and that we should avoid duplicating the same functionalities. One way to do this would be to integrate the work listed above into planemo, probably mostly into the run subcommand.
On the other hand, I have the suspicion that 90% of planemo users see it purely as a SDK for creating Galaxy tools and aren't aware of the huge number of other features which exist. To be honest I am not fully convinced that adding even more features will be helpful. I can see the benefit of a separate compact, high-level command line tool with only a few subcommands.
Anyway, we would be interested to hear any other opinions from planemo devs.
I have the suspicion that 90% of planemo users see it purely as a SDK for creating Galaxy tools and aren't aware of the huge number of other features which exist.
It is already being used and advertised for training material and CWL - so I think the idea that it isn't just for Galaxy Tools is taking off. I also think it will continue to become more clear that it is very useful for workflows - we've made a bunch of upgrades to the run command, the test command, the linting. I think there will continue to be big pushes around Planemo run
and the underlying engine concept.
aren't aware of the huge number of other features which exist
Given my following comments this is going to be an aside - but I think this isn't a huge problem. People don't download programs randomly and use --help to figure out what they can do. They read tutorials. People using Planemo to generate training material didn't find out that they can do that from planemo - they followed tutorials on the training material website.
Between gxwf, galaxy-workflow-executor, using requests, using bioblend, Planemo as a command, using Planemo as a library - it is going to mostly come down to how they are advertised and the quality of the tutorials.
I can see the benefit of a separate compact, high-level command line tool with only a few subcommands.
I've tried to make the actual CLI portion - the click commands at the top of the program - as thin as possible. I want the engine concept to be reusable in other contexts. I get being overwhelmed by the Planemo UI - it should be very possible to build something else on the library (or even a Web UI).
When I tried to sell Bérénice on Planemo - it was as the backend for like a gx-training-material command or something - I don't think it is particularly important that people are using the planemo
executable for this. In fact there may well be aspects of how you would like to do workflow state tracking that clash with Planemo's ethos and that is fine if you're using it as a library.
So the UI and executable aren't super important to me - I think the thing I would sell is that you should use the Planemo engine, runnable, and "job" concepts. This way job descriptions are compatible with Planemo workflow testing. The functionality works against running Galaxy servers, local Planemo managed Galaxy servers, Planemo managed docker Galaxy instances, etc... It will also mean that the months I spent on "how do we specify composite inputs to workflows", "how to we specify nested collections and tags", "how do we download outputs", "how does this work with both Galaxy native and format 2 workflows", etc... are usable. It will also mean Marius's cool new reports for debugging workflows are available (#1048). It will also mean that we can collaborate on future developments like getting Galaxy generated custom PDFs available for the invocations and integrating Marius' cool workflow->script generators into the environment. It will also mean that the work I've done aligning things with CWL and planned future CWL integrations are available essentially for free.
I get that Planemo presents a very big surface to target and build on - but the solution is to create a view of it that works for you and then build your abstractions on top of that (contributing things back as they make sense in a lower level context). Having multiple big competing things with overlapping and conflicting features is problematic.
I should add that I made some recent progress on restructuring Planemo to make this more obvious - https://github.com/galaxyproject/planemo/pull/1059. I'd love a bunch of library documentation outlining how to do this and a bunch of tests of the exposed surface for something like that - I just haven't made progress as quickly or expansively as I had hoped.
Thanks John for the extensive response. I will spend some time working on gxwf so the workflow calls are done via planemo instead of just bioblend.
On whether these projects should actually be integrated into planemo itself: do you have any preference either way?
I think you've implemented this with planemo run
, so we can close this 🎉