mlcube icon indicating copy to clipboard operation
mlcube copied to clipboard

Passing a file as an output path

Open aristizabal95 opened this issue 4 years ago • 3 comments

Right now it is not possible to pass a file as output parameter to a task if that file doesn't exist already.

In this cases, the only solution I've found was to specify the path where the file should exist as the output parameter like so

mlcube.yaml

tasks:
  infer:
    parameters:
      inputs: {parameters_file: parameters.yaml
      outputs: {out_dir: ./}

And inside the parameters.yaml provide the name of the file to be created

parameters.yaml

out_file: inferences.yaml

This can be problematic, as the parameters.yaml file is cube-specific, and there's no easy way to know which key is being used to point to the output file of interest.

MedPerf needs to unwind the location of some output files in order to execute operations like dataset registration, and this requires the assumption that the output file for a given cube will use a determined key inside the parameters.yaml

Additionally, from a user perspective, being able to point to files directly in the input parameters but not in the outputs makes it confusing.

A good solution would be to identify if the user is trying to point to a non-existent file as an output and if that's the case create the file prior to running the cube.

aristizabal95 avatar Sep 14 '21 00:09 aristizabal95

This looks like some kind of bug, I will look into this issue today.

sergey-serebryakov avatar Sep 15 '21 15:09 sergey-serebryakov

@aristizabal95

Right now it is not possible to pass a file as output parameter to a task if that file doesn't exist already.

This sounds weird. MLCube needs to know types (file or directory) of input and output parameters. The type specification is now optional, and if it is not in MLCube config file, MLCube tries to guess the expected type. For input parameters that is possible, since inputs must exist anyway. For output parameters, I think, one rule we use is if parameter values end with forward slash (/). In this case, parameter type is considered to be directory, else configuration error is raised.

What happens if you try to do something like:

tasks:
  infer:
    parameters:
      outputs: {output_file: {type: file, default: inferences.yaml}}

Please, take a look at this example.

sergey-serebryakov avatar Sep 15 '21 15:09 sergey-serebryakov

Per @sergey-serebryakov this is a docs issue

relja128 avatar Oct 22 '21 16:10 relja128