toil icon indicating copy to clipboard operation
toil copied to clipboard

add ability for toil-specific options to be prefixed with --toil-

Open diekhans opened this issue 5 years ago • 9 comments

It can be confusing which option in a program belongs to Toil and which to the application itself. If there was the option to add --toil- to the beginning of the toil options, this would be much less confusing.

for instance ifiddes CAT has both --workDir for toil, while --work-dir is what CAT uses for intermediate files.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-621

diekhans avatar Aug 28 '20 18:08 diekhans

Can you describe the use-case in more detail? I'm not sure that this is the best route, but I'd still like a path forward to making this clearer.

DailyDreaming avatar Aug 28 '20 19:08 DailyDreaming

The use case is when you have a program that embeds toil

It is not clear which options actually apply to toil and hence one should read the toil doc.

the --workDir vs --work-dir might not be that common.

--toilWorkDir would be obvious.

diekhans avatar Aug 28 '20 19:08 diekhans

Generally I'm used to a separator between options. For example, if you run a program with argparse that calls another, you'd use: --, for say, the cwltest runner.

So the command would be something like `cwltest --logLevel=INFO --runner=toil -- aws:us-west-2:somejobstore --logLevel=DEBUG'

Is CAT trying to do something like this?

    parser = argparse.ArgumentParser(description='Runs with toil.')
    parser.add_argument('primary_file', help='A file.')
    parser.add_argument('secondary_file', help='A secondary file.')
    parser.add_argument("--whatever", type=str, required=False, default=None)

    # extra_args_for_toil is an array containing all of the unknown arguments not
    # specified by the parser in this main.  All of these will be passed down later 
    # to toil directly as toil args.
    CAT_args, extra_args_for_toil = parser.parse_known_args()

Maybe CAT could just use one --TOIL-RUN-ARGS='' to make it clearer?

DailyDreaming avatar Aug 28 '20 20:08 DailyDreaming

CAT is just an example of where the confusion arises. I have had this with other software (although no one uses my software except me, fixing the confusion problem).

This is within a single program, not a program calling a program. The generic use cases is a program that calling Job.Runner.addToilOptions(parser).

the cwltool example doesn't really apply here and I find it a confusing command line.

This isn't even a medium priority request, it is some to think about if the command line and config is ever revisited.

diekhans avatar Aug 28 '20 21:08 diekhans

Hmmm... we can discuss it in a future meeting. Could CAT do something like the following?

    parser = argparse.ArgumentParser(description='Run CAT.')
    parser.add_argument('--toilWorkDir', help='A CAT option.')
    parser.parse_args()

    toil_config.workDir = args.toilWorkDir

CAT seems to incorporate a lot of options from multiple sources and it's confusing to me too.

Anyway, it's enough to bring up at the next meeting and discuss. Good luck CAT wrangling. ;D

DailyDreaming avatar Aug 28 '20 21:08 DailyDreaming

yea, CAT it has got luigi in there too.

I wouldn't want to spend time on this just for this use case. However, the Toil parameters are done seems kind of ad-hoc, with some things being in only environment variables.

A good approach is to have every config available in three place: command line, environment, and a config file.

Although environment is mostly for desperation; it is often leads to "it works for me".

So the general priority hierarchy would be:

1st: command line 2nd: environment variables 3rd: config file 4th: defaults

diekhans avatar Aug 28 '20 21:08 diekhans

Oh, I agree with the general priority structure and believe that would be a good issue to work on.

DailyDreaming avatar Aug 28 '20 23:08 DailyDreaming

This would be easier to do now that we have laid the plumbing for config files.

adamnovak avatar Apr 04 '24 17:04 adamnovak