rsmtool icon indicating copy to clipboard operation
rsmtool copied to clipboard

Output file or directory in rsmpredict

Open desilinguist opened this issue 5 years ago • 3 comments

Currently, rsmpredict supports an undocumented option of specifying an output directory instead of file if the output_file does not have a .csv or .xlsx extension. However, there are several inconsistencies:

  1. This option is not documented so the docstring is inaccurate.
  2. The output file format is controlled by the file_format setting in the rsmpredict configuration file and the extension of the specified file is totally ignored, if specified.
  3. The directory bit is untested in addition to being undocumented.
  4. The .tsv file format is not represented in the check that determines whether the output is a file or a directory.

I think a reasonable solution would be to:

  1. Get rid of the directory output option entirely.
  2. Make it so that the output argument is called output_prefix with the file format specified in the configuration file overriding the file format on the command line and an appropriate warning generated.

desilinguist avatar Mar 09 '20 17:03 desilinguist

Actually, now that I think a bit more about it, I think we should strive for consistency. So, here's an alternative I'd prefer:

  • Make rsmpredict also use an output directory.
  • Make --features into a boolean flag so that the pre-processed features are always saved in the given output directory with a fixed name, just like the predictions.

I think this is much simpler than what I had suggested above.

desilinguist avatar Mar 09 '20 22:03 desilinguist

I see the point about consistency, although I can see myself being very annoyed as a user: if I am running multiple experiments, I will end up with lots of directories each containing a single file with the same name. I personally prefer to have one directory with many files. How about we take consistency even further and add a new field prediction_id that will be used as a prefix for the predictions file/other outputs files? (We could also make it optional and by default set to be equal to experiment_id)? Then if we also add -f option already available to other tools, I'll be able to continue doing what I want and we'll have consistent approach across the tools?

aloukina avatar Mar 10 '20 00:03 aloukina

Hmm, yes I can see how that can be quite annoying. I like your suggestion! 👍

desilinguist avatar Mar 10 '20 00:03 desilinguist