nextflow
nextflow copied to clipboard
Command-line parameter type casting can have unintended consequences
Bug report
(Please follow this template replacing the text between parentheses with the requested information)
Expected behavior and actual behavior
When running with AWS batch some container image tags appear troublesome when specified as a command-line argument to nextflow
. I would expect any valid image tag to be useable with nextflow. The troublesome tags I have seen are of the form: 1e123456.
In building a reproducible example for this issue I pinned down the error fairly tightly: it seems command-line parameters to nextflow
which are convertable to float values are converted. The conversion str -> float -> str is not guranteed to result in the starting string. e.g.: --value 7e013608
becomes "Infinity"
when later used, and --value 1.1234567891011121314151617181920
is trucated to "1.1234567891011122"
.
I don't know if this casting is intended but as nextflow
provides no parser akin to argparse and similar, I would not expect it to be happening. I've believe the strings "true"
and "false"
are similarly converted to boolean type.
Steps to reproduce the problem
This repository has a full example of the AWS batch scenario: https://github.com/cjw85/nf-batch-tag
More simply a main.nf
containing:
nextflow.enable.dsl = 2
println(params.badvalue)
can be used
Program output
Outputs for the AWS batch scenario are given in the repository above. Using the simple example above:
$ nextflow run main.nf --badvalue 1.1234e3
N E X T F L O W ~ version 20.10.0
Launching `main.nf` [nauseous_coulomb] - revision: 9065aff9be
1123.4
or
$ nextflow run main.nf --badvalue 7e123456
N E X T F L O W ~ version 20.10.0
Launching `main.nf` [trusting_chandrasekhar] - revision: 9065aff9be
Infinity
show how the input strings are coerced to floats.
Environment
- Nextflow version: 20.10.0 build 5430
- Java version: openjdk version "1.8.0_282"
- Operating system: Ubuntu 18.04
- Bash version: GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Additional context
(Add any other context about the problem here)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I believe this is still an issue, and as pointed out above due to incorrect unwanted coercion in nextflow's argument parsing.
Unable to replicate
NXF_VER=21.10.0 nextflow run test.nf --badvalue 1.1234e3
N E X T F L O W ~ version 21.10.0
Launching `sandbox/numerr.nf` [festering_brown] - revision: 831c5e2222
1.1234e3
I confirm the data is left as a string with 21.10.6:
$ nextflow run main.nf --badvalue 1.1234e5
N E X T F L O W ~ version 21.10.6
Launching `main.nf` [friendly_babbage] - revision: 8794ee7119
1.1234e5
class java.lang.String
Is there a reference somewhere detailing what coercions are attempted? I Can variously get out Integer, Long, and Double numeric types. Scientific notation (i.e 1.123e4) now gives string; which maybe some people would now consider a bug! 😅
It's done by this, essentially tries to convert any string to a bool, int, long or double otherwise it keeps as a string
https://github.com/nextflow-io/nextflow/blob/8c0566fc3a35c8d3a4e01a508a0667e471bab297/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L506-L518
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@bentsherman Is there any update on this issue? An option to disable the unhelpful automatic casting would be useful.
I think the plan here is to create a new CLI tool (largely backward compatible) to correct this and other defects
I think this problem should be handled by the configuration. The CLI shouldn't try to typecast the params at all, and instead the config params
scope should have some kind of syntax to specify the param type.
Here's one idea off the top of my head:
params {
@Type(Integer)
n_iters = 1000
}
The point is, creating a new CLI won't really solve the problem, because the CLI has no idea what type a param should be. Until then, the best we can do is make the params parsing configurable through an env variable or config option. But we don't need a new CLI for that, we can make it configurable for both CLIs.
This would be very nice. But I think it's not possible to annotate Map
elements (params
is a Map object)
Then change it so its not 😉
In all seriousness, when using nf-core style parameter schemas there no source of truth for parameters. We have a lot of CI to ensure schemas and params
are in sync. Nextflow seriously needs a single way to handle declaration, specification, typing, documenting, ..., of workflow parameters. I think this goes hand-in-hand with the CLI work.
Indeed, annotations can only be used on declarations like classes, fields, methods, and arguments. We would have to turn the params scope into a class declaration with typed fields, don't know if that would work.
On the other hand, FastAPI has a Query
class that can be used to define API query parameters with extra details like default value, validation, title/description for docs, etc. I think that model would work well for pipeline params.
FastAPI example:
n_iters: int = Query(default=1000, ge=0, description="Number of iterations")
Nextflow example (proposed):
params {
// this param won't be validated
n_iters_raw = 1000
// this param will be validated
n_iters = Param(type=Integer, default=1000, ge=0, description="Number of iterations")
}
I think this way we could incorporate all of the functionality that nf-core is trying to do with params.
Hi,
I have a parameter field of type --ID 1234
but today I subsampled my data and created sub-IDs:
--ID 1234.25
--ID 1234.50
well, this last try was not working for some reason... now I understand, still took me two hours xD
we should be able to define parameter types.
(thanks for the tool!)
@bentsherman is there any update on this. This is becoming a blocker for us.
What about introducing an option to disable parameters type conversion at all and just use the string as it has been entered?
Yes, a simple option to disable all casting of parameter values and leave developers to do it for themselves would work.
@bentsherman Is there any update on this issue? An option to disable the unhelpful automatic casting would be useful.
😜
Fair enough, think we can work add this easily
Thanks! I think as an interim solution we can then write a class to mutate params
based on the content of an nf-core-style jsonschema document with type information.
Solved via 9a1c584d