Output file or directory in rsmpredict
Currently, rsmpredict supports an undocumented option of specifying an output directory instead of file if the output_file does not have a .csv or .xlsx extension. However, there are several inconsistencies:
- This option is not documented so the docstring is inaccurate.
- The output file format is controlled by the
file_formatsetting in thersmpredictconfiguration file and the extension of the specified file is totally ignored, if specified. - The directory bit is untested in addition to being undocumented.
- The
.tsvfile format is not represented in the check that determines whether the output is a file or a directory.
I think a reasonable solution would be to:
- Get rid of the directory output option entirely.
- Make it so that the output argument is called
output_prefixwith the file format specified in the configuration file overriding the file format on the command line and an appropriate warning generated.
Actually, now that I think a bit more about it, I think we should strive for consistency. So, here's an alternative I'd prefer:
- Make
rsmpredictalso use an output directory. - Make
--featuresinto a boolean flag so that the pre-processed features are always saved in the given output directory with a fixed name, just like the predictions.
I think this is much simpler than what I had suggested above.
I see the point about consistency, although I can see myself being very annoyed as a user: if I am running multiple experiments, I will end up with lots of directories each containing a single file with the same name. I personally prefer to have one directory with many files.
How about we take consistency even further and add a new field prediction_id that will be used as a prefix for the predictions file/other outputs files? (We could also make it optional and by default set to be equal to experiment_id)? Then if we also add -f option already available to other tools, I'll be able to continue doing what I want and we'll have consistent approach across the tools?
Hmm, yes I can see how that can be quite annoying. I like your suggestion! 👍