sdrf-pipelines
sdrf-pipelines copied to clipboard
[DISCUSSION] A more concise CLI that is well behaved
When looking at the sdrf-pipelines
package there are many different terms that have a range of meanings and uses.
The string sdrf-pipelines
is only used for the installation and the import of the package. I think that is fine.
Now for the CLI. It introduces the command parse_sdrf
. I think this is inherently not a bad name if what the tool does is parsing one or more SDRF files. But the tool actually does more. It validates SDRF files when called with parse_sdrf validate-sdrf ...
and converts parse_sdrf convert-openms ...
them to input files for other tools based. Validation requires parsing and conversion as well so the parse
in the parse_sdrf
seems redundant and even somewhat misleading because the tool advertises the parsing but goes much further than parsing SDRF files. Thats why I think the name of the command line tool is not ideal.
Also note that the "conversion" it is not actually a pure conversion of the information in the in the SDRF. In the case of the MaxQuant output it also an enrichment. parse_sdrf
as a command name doesn't do that justice.
Because of all the above I propose to adopt a new CLI naming and behaviour as follows:
- a command called
sdrf
which can be used to validate and write SDRF files. -
sdrf validate
only validates sdrf files. There might be something like a--strict
flag to make it only validate byte-perfect SDRF files and would complain about trailing whitespaces and other errors ignored by a the permissive parser.
The CLI should be a well-behaved. Some specific properties that come to mind are pipes. It would be great if input and output can be piped.
sdrf
is NOT a
- debian/ubuntu/redhat package name
- Python package name
- MacPorts package
- bioconda package
- biocontainer
I know there is already a sdrf-pipelines/sdrf_parse/convert-openms/convert-maxquant/... But I think that the tool is still relatively new and would profit from a change in the long run. Implementing the proposals above would also not require that the current syntax would break immediately.
For the conversion, there could be an analog sdrf
command:
-
sdrf convert --from-format [input-format] --in [file] --to-format [output format] --out [file] [additional configurations]
This should also handle the--strict
flag mentioned above and convert from and to the SDRF file format. The format only has to be specified for the non-SDRF file.
Thoughts and discussions are welcome.