sdrf-pipelines icon indicating copy to clipboard operation
sdrf-pipelines copied to clipboard

[DISCUSSION] A more concise CLI that is well behaved

Open fabianegli opened this issue 2 years ago • 0 comments

When looking at the sdrf-pipelines package there are many different terms that have a range of meanings and uses.

The string sdrf-pipelines is only used for the installation and the import of the package. I think that is fine.

Now for the CLI. It introduces the command parse_sdrf. I think this is inherently not a bad name if what the tool does is parsing one or more SDRF files. But the tool actually does more. It validates SDRF files when called with parse_sdrf validate-sdrf ... and converts parse_sdrf convert-openms ... them to input files for other tools based. Validation requires parsing and conversion as well so the parse in the parse_sdrf seems redundant and even somewhat misleading because the tool advertises the parsing but goes much further than parsing SDRF files. Thats why I think the name of the command line tool is not ideal.

Also note that the "conversion" it is not actually a pure conversion of the information in the in the SDRF. In the case of the MaxQuant output it also an enrichment. parse_sdrf as a command name doesn't do that justice.

Because of all the above I propose to adopt a new CLI naming and behaviour as follows:

  • a command called sdrf which can be used to validate and write SDRF files.
  • sdrf validate only validates sdrf files. There might be something like a --strict flag to make it only validate byte-perfect SDRF files and would complain about trailing whitespaces and other errors ignored by a the permissive parser.

The CLI should be a well-behaved. Some specific properties that come to mind are pipes. It would be great if input and output can be piped.

sdrf is NOT a

  • debian/ubuntu/redhat package name
  • Python package name
  • MacPorts package
  • bioconda package
  • biocontainer

I know there is already a sdrf-pipelines/sdrf_parse/convert-openms/convert-maxquant/... But I think that the tool is still relatively new and would profit from a change in the long run. Implementing the proposals above would also not require that the current syntax would break immediately.

For the conversion, there could be an analog sdrf command:

  • sdrf convert --from-format [input-format] --in [file] --to-format [output format] --out [file] [additional configurations] This should also handle the --strict flag mentioned above and convert from and to the SDRF file format. The format only has to be specified for the non-SDRF file.

Thoughts and discussions are welcome.

fabianegli avatar Apr 05 '22 06:04 fabianegli