nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Command-line parameter type casting can have unintended consequences

Open cjw85 opened this issue 3 years ago • 6 comments

Bug report

(Please follow this template replacing the text between parentheses with the requested information)

Expected behavior and actual behavior

When running with AWS batch some container image tags appear troublesome when specified as a command-line argument to nextflow. I would expect any valid image tag to be useable with nextflow. The troublesome tags I have seen are of the form: 1e123456.

In building a reproducible example for this issue I pinned down the error fairly tightly: it seems command-line parameters to nextflow which are convertable to float values are converted. The conversion str -> float -> str is not guranteed to result in the starting string. e.g.: --value 7e013608 becomes "Infinity" when later used, and --value 1.1234567891011121314151617181920 is trucated to "1.1234567891011122".

I don't know if this casting is intended but as nextflow provides no parser akin to argparse and similar, I would not expect it to be happening. I've believe the strings "true" and "false" are similarly converted to boolean type.

Steps to reproduce the problem

This repository has a full example of the AWS batch scenario: https://github.com/cjw85/nf-batch-tag

More simply a main.nf containing:

nextflow.enable.dsl = 2
println(params.badvalue)

can be used

Program output

Outputs for the AWS batch scenario are given in the repository above. Using the simple example above:

$ nextflow run main.nf --badvalue 1.1234e3
N E X T F L O W  ~  version 20.10.0
Launching `main.nf` [nauseous_coulomb] - revision: 9065aff9be
1123.4

or

$ nextflow run main.nf --badvalue 7e123456
N E X T F L O W  ~  version 20.10.0
Launching `main.nf` [trusting_chandrasekhar] - revision: 9065aff9be
Infinity

show how the input strings are coerced to floats.

Environment

  • Nextflow version: 20.10.0 build 5430
  • Java version: openjdk version "1.8.0_282"
  • Operating system: Ubuntu 18.04
  • Bash version: GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

Additional context

(Add any other context about the problem here)

cjw85 avatar May 03 '21 20:05 cjw85

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 18 '21 20:11 stale[bot]

I believe this is still an issue, and as pointed out above due to incorrect unwanted coercion in nextflow's argument parsing.

cjw85 avatar Nov 18 '21 20:11 cjw85

Unable to replicate

NXF_VER=21.10.0 nextflow run test.nf  --badvalue 1.1234e3
N E X T F L O W  ~  version 21.10.0
Launching `sandbox/numerr.nf` [festering_brown] - revision: 831c5e2222
1.1234e3

pditommaso avatar Dec 22 '21 20:12 pditommaso

I confirm the data is left as a string with 21.10.6:

$ nextflow run main.nf --badvalue 1.1234e5
N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [friendly_babbage] - revision: 8794ee7119
1.1234e5
class java.lang.String

Is there a reference somewhere detailing what coercions are attempted? I Can variously get out Integer, Long, and Double numeric types. Scientific notation (i.e 1.123e4) now gives string; which maybe some people would now consider a bug! 😅

cjw85 avatar Dec 22 '21 21:12 cjw85

It's done by this, essentially tries to convert any string to a bool, int, long or double otherwise it keeps as a string

https://github.com/nextflow-io/nextflow/blob/8c0566fc3a35c8d3a4e01a508a0667e471bab297/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L506-L518

pditommaso avatar Dec 23 '21 12:12 pditommaso

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 11 '22 03:06 stale[bot]

@bentsherman Is there any update on this issue? An option to disable the unhelpful automatic casting would be useful.

cjw85 avatar Dec 16 '22 21:12 cjw85

I think the plan here is to create a new CLI tool (largely backward compatible) to correct this and other defects

pditommaso avatar Jan 10 '23 04:01 pditommaso

I think this problem should be handled by the configuration. The CLI shouldn't try to typecast the params at all, and instead the config params scope should have some kind of syntax to specify the param type.

Here's one idea off the top of my head:

params {
    @Type(Integer)
    n_iters = 1000
}

The point is, creating a new CLI won't really solve the problem, because the CLI has no idea what type a param should be. Until then, the best we can do is make the params parsing configurable through an env variable or config option. But we don't need a new CLI for that, we can make it configurable for both CLIs.

bentsherman avatar Feb 07 '23 22:02 bentsherman

This would be very nice. But I think it's not possible to annotate Map elements (params is a Map object)

pditommaso avatar Feb 09 '23 10:02 pditommaso

Then change it so its not 😉

cjw85 avatar Feb 09 '23 10:02 cjw85

In all seriousness, when using nf-core style parameter schemas there no source of truth for parameters. We have a lot of CI to ensure schemas and params are in sync. Nextflow seriously needs a single way to handle declaration, specification, typing, documenting, ..., of workflow parameters. I think this goes hand-in-hand with the CLI work.

cjw85 avatar Feb 09 '23 10:02 cjw85

Indeed, annotations can only be used on declarations like classes, fields, methods, and arguments. We would have to turn the params scope into a class declaration with typed fields, don't know if that would work.

On the other hand, FastAPI has a Query class that can be used to define API query parameters with extra details like default value, validation, title/description for docs, etc. I think that model would work well for pipeline params.

FastAPI example:

n_iters: int = Query(default=1000, ge=0, description="Number of iterations")

Nextflow example (proposed):

params {
  // this param won't be validated
  n_iters_raw = 1000

  // this param will be validated
  n_iters = Param(type=Integer, default=1000, ge=0, description="Number of iterations")
}

I think this way we could incorporate all of the functionality that nf-core is trying to do with params.

bentsherman avatar Feb 09 '23 12:02 bentsherman

Hi, I have a parameter field of type --ID 1234 but today I subsampled my data and created sub-IDs: --ID 1234.25 --ID 1234.50 well, this last try was not working for some reason... now I understand, still took me two hours xD

we should be able to define parameter types.

(thanks for the tool!)

fransua avatar Jun 30 '23 14:06 fransua

@bentsherman is there any update on this. This is becoming a blocker for us.

cjw85 avatar Jul 16 '23 20:07 cjw85

What about introducing an option to disable parameters type conversion at all and just use the string as it has been entered?

pditommaso avatar Jul 17 '23 07:07 pditommaso

Yes, a simple option to disable all casting of parameter values and leave developers to do it for themselves would work.

@bentsherman Is there any update on this issue? An option to disable the unhelpful automatic casting would be useful.

😜

cjw85 avatar Jul 17 '23 07:07 cjw85

Fair enough, think we can work add this easily

pditommaso avatar Jul 17 '23 08:07 pditommaso

Thanks! I think as an interim solution we can then write a class to mutate params based on the content of an nf-core-style jsonschema document with type information.

cjw85 avatar Jul 17 '23 08:07 cjw85

Solved via 9a1c584d

pditommaso avatar Jul 18 '23 10:07 pditommaso