galaxy
galaxy copied to clipboard
Lint for unused input parameters
This is first attempt to lint for input parameters that are not used:
- in
commandorconfigfile - in
output/filter - in
/outputs//change_format/when
Basic idea is to grep for "[^\w_]" + param_name + r"([^\w_]|$)", i.e. the placeholder delimited by something that can't belong to the placeholder.
Certainly this is very (maybe to) simplistic .. but it seems to work quite well :) .. But certainly it might be useful to check via some cheetah python lib if parameters are used...
I also experimented with stricter regular expressions (i.e. more specific delimiters), but there are to many cases to consider. In config files, for instance, the placeholders could be joined by any (non-placeholder) character anyway.
It might help
- if we would be more strict with cheetah formatting (e.g. PEP8-ish) and
- enforce
${param}syntax (instead of just$param) in more cases
Still there are cases like:
${ ",".join( [ str( platform.platform_entry ) ] ) }$getVar($sect + "excl_ival_type.excl_ival_type_sel")
TODO:
- [x] apparently dynamic inclusion of xml tokens is possible (I did not know so far) https://github.com/galaxyproject/tools-iuc/blob/f2dfef5e47d50e07268731e372505643048116a9/tools/gatk4/gatk4_Mutect2.xml#L284 .. the problem is that the content of the token does not make it into the
codevariable for linting- for the specific case a workaround would be to used static inclusion (
@token@)
- for the specific case a workaround would be to used static inclusion (
Additionally: added linting of configfiles
xref https://github.com/galaxyproject/tools-iuc/issues/4085
TODO:
- [x] parameters that have a validator: allow https://github.com/galaxyproject/tools-iuc/pull/4241 ?
How to test the changes?
(Select all options that apply)
- [x] I've included appropriate automated tests.
- [ ] This is a refactoring of components with existing test coverage.
- [ ] Instructions for manual testing are as follows:
And manually ran it on the IUC repo.
License
- [x] I agree to license these contributions under Galaxy's current license.
- [x] I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to [email protected].
So easy? Super cool, this will help a lot! I was thinking we need to go through cheetah and hook into the templates but if a simple grep works - awesome!
So easy? Super cool, this will help a lot! I was thinking we need to go through cheetah and hook into the templates but if a simple grep works - awesome!
Yes, but can be improved. I started of with a regexp (\$\S+' +name + '\s') which was workung well. But it can only be applied to command an config files. Also the final \s had problems, needs more chars, at least [\s'}].
Spend some time to improve this: Instead of a simple regex search for the parameter name the code searches now in the Compiled template.
The starting motivation was to avoid the problem of placeholders listed in comments (and removing comments seems not trivial), but also very short variable names (eg single letter) had false negatives.
The main advantage for searching in the compiled templates is that cheetah adds some code (VFN, VFFSL, rawExpr) that "ensures" to search only in uncommented code (of course if tool devs would use text matching the regular expressions in codes this would still lead to false negatives).
In theory there may be also possibilities in the cheetah lib (see)[https://sourceforge.net/p/cheetahtemplate/mailman/cheetahtemplate-discuss/thread/Pine.LNX.4.64.0803092139100.22574%40calrudd-server.calrudd.com/#msg18793519], but I don't dare to touch this (note that the devs also suggest to grep .. which is in my opinion quite complex).
I developed a series of regular expressions for discovering all syntax that I could think of. Unfortunatelly some cheetah expressions require quite "unspecific" (a bit less specific than I would find really good). So at the moment most of the specific ones are commented (if covered by the less specific ones).
I also checked the tool repos for which I had local clones and opened issues:
- https://github.com/galaxyproject/tools-iuc/issues/4085
- https://github.com/galaxyproject/tools-devteam/issues/601
- https://github.com/bgruening/galaxytools/issues/1185
- https://github.com/galaxyproteomics/tools-galaxyp/issues/647
- https://github.com/ARTbio/tools-artbio/issues/576
For completeness: found none in (W4M)[https://github.com/workflow4metabolomics/tools-metabolomics.git]
Will this work for tools that use galaxy.json or the configfile to dump parameters (<inputs name="inputs" filename="inputs.json" />) or data managers that are just used in external scripts?
Will this work for tools that use galaxy.json or the configfile to dump parameters (
) or data managers that are just used in external scripts?
Yes.
The code variable contains the concatenation of the content of the <command> and all <configfile> tags: https://github.com/galaxyproject/galaxy/blob/ee0e5c4456967b5a92f292256b0d5abe746c820e/lib/galaxy/tool_util/linters/_util.py#L13
This could be even stricter: actually we might/should check also if the names of all configfiles appears in the command (or other configfiles). Let me just add this to the new configfile linter :)
The inputs tag is covered by the following code: https://github.com/galaxyproject/galaxy/blob/ee0e5c4456967b5a92f292256b0d5abe746c820e/lib/galaxy/tool_util/linters/inputs.py#L394
We do the following checks:
- for data parameters the
data_styleattribute needs to be set (since otherwise data parameters are excluded) - it needs to be used in the command or any on the config files
Will this work for tools that use galaxy.json
How can input parameters be used via galaxy.json?
I would love to see this applied to tools-iuc and see if there are any false negatives and if it catches in any actual positives.
I also wonder if just compiling the Cheetah is actually going to be the biggest win here - does you get_code throw an exception on invalid cheetah and can we make that a linting we do also?
I would love to see this applied to tools-iuc and see if there are any false negatives and if it catches in any actual positives.
Already did this in January, see https://github.com/galaxyproject/galaxy/pull/12724#issuecomment-1020454434 many have been fixed already. Have not found a single FP so far (except for copy paste mistakes .. because I did the check semi automatic .. ). But this is only because tool developers do not use varExists and getVar .. which is the only option that I can think of to construct a FP case (in this case we should degrade it to warn/info).
FN are more difficult to answer.
I also wonder if just compiling the Cheetah is actually going to be the biggest win here - does you get_code throw an exception on invalid cheetah and can we make that a linting we do also?
That is an excellent idea. Maybe at least some errors can be caught, e.g. syntax errors. But some errors will only become apparent during execution. Lets see if I can add a test.
Also note that this is implemented in a separate linter which can be enabled/disabled completely via planemo
I would love to see this applied to tools-iuc
So I did it again:
- [ ] tools/allegro/allegro.xml
- [ ] ERROR: Param input [opt_sexspecific] not found in command or configfiles.
- [ ] tools/arriba/arriba_draw_fusions.xml
- [ ] ERROR: Param input [ref_file] not found in command or configfiles.
- [ ] ERROR: Param input [fixedScale] not found in command or configfiles.
- [ ] tools/arriba/arriba.xml
- [ ] ERROR: Param input [fixedScale] not found in command or configfiles.
- [ ] tools/bcftools/bcftools_query.xml
- [ ] ERROR: Param input [tsv] is only used in a change_format tag
- [ ] tools/bedtools/closestBed.xml
- [ ] ERROR: Param input [fu] not found in command or configfiles.
- [ ] ERROR: Param input [fd] not found in command or configfiles.
- [ ] ERROR: Param input [fu] not found in command or configfiles.
- [ ] ERROR: Param input [fd] not found in command or configfiles.
- [ ] ERROR: Param input [fu] not found in command or configfiles.
- [ ] ERROR: Param input [fd] not found in command or configfiles.
- [ ] tools/cat/cat_contigs.xml
- [ ] WARNING: Param input [seqtype] found in output actions, better use extended metadata.
- [ ] WARNING: Param input [seqtype] found in a label attribute, this is discouraged.
- [ ] WARNING: Param input [sum_titles] only found in output actions, better use extended metadata.
- [ ] WARNING: Param input [bin_col] only found in output actions, better use extended metadata.
- [ ] tools/cat/cat_bins.xml
- [ ] WARNING: Param input [seqtype] found in output actions, better use extended metadata.
- [ ] WARNING: Param input [seqtype] found in a label attribute, this is discouraged.
- [ ] WARNING: Param input [sum_titles] only found in output actions, better use extended metadata.
- [ ] WARNING: Param input [bin_col] only found in output actions, better use extended metadata.
- [ ] tools/disco/disco.xml
- [ ] ERROR: Param input [PrintScaffolds] not found in command or configfiles.
- [ ] tools/egsea/egsea.xml
- [ ] ERROR: Param input [non_commercial_use] not found in command or configfiles.
- [ ] tools/emboss_5/emboss_epestfind.xml
- [ ] ERROR: Param input [threshold] not found in command or configfiles.
- [ ] tools/emboss_5/emboss_textsearch.xml
- [ ] ERROR: Param input [casesensitive] not found in command or configfiles.
- [ ] tools/gatk4/gatk4_Mutect2.xml
- [ ] WARNING: Param input [exclude_intervals] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [exclude_intervals] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [intervals] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [intervals] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [seqdict_sequence] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [seqdict_sequence] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
- [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
- [ ] tools/genehunter_modscore/genehunter_modscore.xml
- [ ] ERROR: Param input [extra_alg_mem] not found in command or configfiles.
- [ ] tools/ggplot2/ggplot2_heatmap.xml
- [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
- [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
- [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
- [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
- [ ] tools/ggplot2/ggplot_violin.xml
- [ ] ERROR: Param input [xaxismin] not found in command or configfiles.
- [ ] ERROR: Param input [xaxismax] not found in command or configfiles.
- [ ] tools/hicexplorer/hicConvertFormat.xml
- [ ] ERROR: Param input [storeAppliedCorrection] not found in command or configfiles.
- [ ] ERROR: Param input [enforceInteger] not found in command or configfiles.
- [ ] ERROR: Param input [storeAppliedCorrection] not found in command or configfiles.
- [ ] ERROR: Param input [enforceInteger] not found in command or configfiles.
- [ ] tools/homer/homer_findMotifsGenome.xml
- [ ] ERROR: Param input [olen] not found in command or configfiles.
- [ ] tools/iqtree/iqtree.xml
- [ ] ERROR: Param input [seqtype] not found in command or configfiles.
- [ ] tools/maker/maker.xml
- [ ] ERROR: Param input [license_agreement] not found in command or configfiles.
- [ ] tools/metaphlan/metaphlan.xml
- [ ] ERROR: Param input [stat] not found in command or configfiles.
- [ ] tools/miniprot/miniprot.xml
- [ ] ERROR: Param input [query_batch_size] not found in command or configfiles.
- [ ] tools/mothur/classify.tree.xml
- [ ] ERROR: Param input [cutoff] not found in command or configfiles.
- [ ] tools/mothur/cluster.xml
- [ ] ERROR: Param input [fasta] not found in command or configfiles.
- [ ] ERROR: Param input [fasta] not found in command or configfiles.
- [ ] ERROR: Param input [fasta] not found in command or configfiles.
- [ ] tools/mothur/chimera.ccode.xml
- [ ] ERROR: Param input [mask] not found in command or configfiles.
- [ ] tools/ncbi_datasets/datasets_gene.xml
- [ ] ERROR: Param input [decompress] not found in command or configfiles.
- [ ] tools/necat/necat.xml
- [ ] ERROR: Param input [min_length] not found in command or configfiles.
- [ ] ERROR: Param input [max_length] not found in command or configfiles.
- [ ] ERROR: Param input [min_identity] not found in command or configfiles.
- [ ] ERROR: Param input [min_aligned_length] not found in command or configfiles.
- [ ] ERROR: Param input [max_overhang] not found in command or configfiles.
- [ ] ERROR: Param input [min_coverage] not found in command or configfiles.
- [ ] ERROR: Param input [max_coverage] not found in command or configfiles.
- [ ] ERROR: Param input [max_diff_coverage] not found in command or configfiles.
- [ ] ERROR: Param input [coverage_discard] not found in command or configfiles.
- [ ] ERROR: Param input [bestn] not found in command or configfiles.
- [ ] ERROR: Param input [coverage] not found in command or configfiles.
- [ ] ERROR: Param input [identity_global_deviation1] not found in command or configfiles.
- [ ] ERROR: Param input [identity_global_deviation2] not found in command or configfiles.
- [ ] ERROR: Param input [overhang_global_deviation1] not found in command or configfiles.
- [ ] ERROR: Param input [overhang_global_deviation2] not found in command or configfiles.
- [ ] ERROR: Param input [identity_local_deviation1] not found in command or configfiles.
- [ ] ERROR: Param input [identity_local_deviation2] not found in command or configfiles.
- [ ] ERROR: Param input [overhang_local_deviation1] not found in command or configfiles.
- [ ] ERROR: Param input [overhang_local_deviation2] not found in command or configfiles.
- [ ] ERROR: Param input [identity_local_condition] not found in command or configfiles.
- [ ] ERROR: Param input [local_low_coverage] not found in command or configfiles.
- [ ] ERROR: Param input [min_length] not found in command or configfiles.
- [ ] ERROR: Param input [min_identity] not found in command or configfiles.
- [ ] ERROR: Param input [min_contig_length] not found in command or configfiles.
- [ ] ERROR: Param input [max_spur_length] not found in command or configfiles.
- [ ] ERROR: Param input [select_branch] not found in command or configfiles.
- [ ] ERROR: Param input [read_min_length] not found in command or configfiles.
- [ ] ERROR: Param input [ctg_min_length] not found in command or configfiles.
- [ ] ERROR: Param input [ctg2ctg_min_identity] not found in command or configfiles.
- [ ] ERROR: Param input [ctg2ctg_max_overhang] not found in command or configfiles.
- [ ] ERROR: Param input [ctg2ctg_min_aligned_length] not found in command or configfiles.
- [ ] ERROR: Param input [read2ctg_min_identity] not found in command or configfiles.
- [ ] ERROR: Param input [read2ctg_max_overhang] not found in command or configfiles.
- [ ] ERROR: Param input [read2ctg_min_aligned_length] not found in command or configfiles.
- [ ] ERROR: Param input [read2ctg_min_coverage] not found in command or configfiles.
- [ ] ERROR: Param input [min_contig_length] not found in command or configfiles.
- [ ] ERROR: Param input [window_size] not found in command or configfiles.
- [ ] ERROR: Param input [select_branch] not found in command or configfiles.
- [ ] tools/polypolish/polypolish.xml
- [ ] ERROR: Param input [min_depth] not found in command or configfiles.
- [ ] ERROR: Param input [fraction_invalid] not found in command or configfiles.
- [ ] ERROR: Param input [max_errors] not found in command or configfiles.
- [ ] ERROR: Param input [fraction_valid] not found in command or configfiles.
- [ ] tools/schicexplorer/scHicPlotClusterProfiles.xml
- [ ] ERROR: Param input [maximalDistance] not found in command or configfiles.
- [ ] tools/schicexplorer/scHicConsensusMatrices.xml
- [ ] ERROR: Param input [chromosomes] not found in command or configfiles.
For my own reference
planemo ci_find_repos --exclude packages --exclude deprecated --exclude_from .tt_skip --output repository_list.txt
while read line
do
>&2 echo $line; planemo lint $line --skip help,tests,inputs,output,general,command,citations,tool_xsd,xml_order,stdio
done < repository_list.txt > lint.tx
cat lint.txt | grep -v Applying | egrep "ERROR|WARNING" -B 1 | grep -v '\-\-'| sed 's@Linting tool /home/berntm/projects/tools-iuc/@- [ ] @; s/^\.\./ - [ ]/'