galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Lint for unused input parameters

Open bernt-matthias opened this issue 4 years ago • 5 comments
trafficstars

This is first attempt to lint for input parameters that are not used:

  • in command or configfile
  • in output/filter
  • in /outputs//change_format/when

Basic idea is to grep for "[^\w_]" + param_name + r"([^\w_]|$)", i.e. the placeholder delimited by something that can't belong to the placeholder.

Certainly this is very (maybe to) simplistic .. but it seems to work quite well :) .. But certainly it might be useful to check via some cheetah python lib if parameters are used...

I also experimented with stricter regular expressions (i.e. more specific delimiters), but there are to many cases to consider. In config files, for instance, the placeholders could be joined by any (non-placeholder) character anyway.

It might help

  • if we would be more strict with cheetah formatting (e.g. PEP8-ish) and
  • enforce ${param} syntax (instead of just $param) in more cases

Still there are cases like:

  • ${ ",".join( [ str( platform.platform_entry ) ] ) }
  • $getVar($sect + "excl_ival_type.excl_ival_type_sel")

TODO:

  • [x] apparently dynamic inclusion of xml tokens is possible (I did not know so far) https://github.com/galaxyproject/tools-iuc/blob/f2dfef5e47d50e07268731e372505643048116a9/tools/gatk4/gatk4_Mutect2.xml#L284 .. the problem is that the content of the token does not make it into the code variable for linting
    • for the specific case a workaround would be to used static inclusion (@token@)

Additionally: added linting of configfiles

xref https://github.com/galaxyproject/tools-iuc/issues/4085

TODO:

  • [x] parameters that have a validator: allow https://github.com/galaxyproject/tools-iuc/pull/4241 ?

How to test the changes?

(Select all options that apply)

  • [x] I've included appropriate automated tests.
  • [ ] This is a refactoring of components with existing test coverage.
  • [ ] Instructions for manual testing are as follows:

And manually ran it on the IUC repo.

License

  • [x] I agree to license these contributions under Galaxy's current license.
  • [x] I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to [email protected].

bernt-matthias avatar Oct 15 '21 18:10 bernt-matthias

So easy? Super cool, this will help a lot! I was thinking we need to go through cheetah and hook into the templates but if a simple grep works - awesome!

bgruening avatar Oct 16 '21 07:10 bgruening

So easy? Super cool, this will help a lot! I was thinking we need to go through cheetah and hook into the templates but if a simple grep works - awesome!

Yes, but can be improved. I started of with a regexp (\$\S+' +name + '\s') which was workung well. But it can only be applied to command an config files. Also the final \s had problems, needs more chars, at least [\s'}].

bernt-matthias avatar Oct 17 '21 08:10 bernt-matthias

Spend some time to improve this: Instead of a simple regex search for the parameter name the code searches now in the Compiled template.

The starting motivation was to avoid the problem of placeholders listed in comments (and removing comments seems not trivial), but also very short variable names (eg single letter) had false negatives.

The main advantage for searching in the compiled templates is that cheetah adds some code (VFN, VFFSL, rawExpr) that "ensures" to search only in uncommented code (of course if tool devs would use text matching the regular expressions in codes this would still lead to false negatives). In theory there may be also possibilities in the cheetah lib (see)[https://sourceforge.net/p/cheetahtemplate/mailman/cheetahtemplate-discuss/thread/Pine.LNX.4.64.0803092139100.22574%40calrudd-server.calrudd.com/#msg18793519], but I don't dare to touch this (note that the devs also suggest to grep .. which is in my opinion quite complex).

I developed a series of regular expressions for discovering all syntax that I could think of. Unfortunatelly some cheetah expressions require quite "unspecific" (a bit less specific than I would find really good). So at the moment most of the specific ones are commented (if covered by the less specific ones).

I also checked the tool repos for which I had local clones and opened issues:

  • https://github.com/galaxyproject/tools-iuc/issues/4085
  • https://github.com/galaxyproject/tools-devteam/issues/601
  • https://github.com/bgruening/galaxytools/issues/1185
  • https://github.com/galaxyproteomics/tools-galaxyp/issues/647
  • https://github.com/ARTbio/tools-artbio/issues/576

For completeness: found none in (W4M)[https://github.com/workflow4metabolomics/tools-metabolomics.git]

bernt-matthias avatar Jan 24 '22 19:01 bernt-matthias

Will this work for tools that use galaxy.json or the configfile to dump parameters (<inputs name="inputs" filename="inputs.json" />) or data managers that are just used in external scripts?

mvdbeek avatar Jan 25 '22 09:01 mvdbeek

Will this work for tools that use galaxy.json or the configfile to dump parameters () or data managers that are just used in external scripts?

Yes.

The code variable contains the concatenation of the content of the <command> and all <configfile> tags: https://github.com/galaxyproject/galaxy/blob/ee0e5c4456967b5a92f292256b0d5abe746c820e/lib/galaxy/tool_util/linters/_util.py#L13

This could be even stricter: actually we might/should check also if the names of all configfiles appears in the command (or other configfiles). Let me just add this to the new configfile linter :)

The inputs tag is covered by the following code: https://github.com/galaxyproject/galaxy/blob/ee0e5c4456967b5a92f292256b0d5abe746c820e/lib/galaxy/tool_util/linters/inputs.py#L394

We do the following checks:

  • for data parameters the data_style attribute needs to be set (since otherwise data parameters are excluded)
  • it needs to be used in the command or any on the config files

bernt-matthias avatar Jan 25 '22 10:01 bernt-matthias

Will this work for tools that use galaxy.json

How can input parameters be used via galaxy.json?

bernt-matthias avatar Sep 14 '22 16:09 bernt-matthias

I would love to see this applied to tools-iuc and see if there are any false negatives and if it catches in any actual positives.

I also wonder if just compiling the Cheetah is actually going to be the biggest win here - does you get_code throw an exception on invalid cheetah and can we make that a linting we do also?

jmchilton avatar Dec 08 '22 17:12 jmchilton

I would love to see this applied to tools-iuc and see if there are any false negatives and if it catches in any actual positives.

Already did this in January, see https://github.com/galaxyproject/galaxy/pull/12724#issuecomment-1020454434 many have been fixed already. Have not found a single FP so far (except for copy paste mistakes .. because I did the check semi automatic .. ). But this is only because tool developers do not use varExists and getVar .. which is the only option that I can think of to construct a FP case (in this case we should degrade it to warn/info).

FN are more difficult to answer.

I also wonder if just compiling the Cheetah is actually going to be the biggest win here - does you get_code throw an exception on invalid cheetah and can we make that a linting we do also?

That is an excellent idea. Maybe at least some errors can be caught, e.g. syntax errors. But some errors will only become apparent during execution. Lets see if I can add a test.

bernt-matthias avatar Dec 09 '22 09:12 bernt-matthias

Also note that this is implemented in a separate linter which can be enabled/disabled completely via planemo

bernt-matthias avatar Dec 09 '22 11:12 bernt-matthias

I would love to see this applied to tools-iuc

So I did it again:

  • [ ] tools/allegro/allegro.xml
    • [ ] ERROR: Param input [opt_sexspecific] not found in command or configfiles.
  • [ ] tools/arriba/arriba_draw_fusions.xml
    • [ ] ERROR: Param input [ref_file] not found in command or configfiles.
    • [ ] ERROR: Param input [fixedScale] not found in command or configfiles.
  • [ ] tools/arriba/arriba.xml
    • [ ] ERROR: Param input [fixedScale] not found in command or configfiles.
  • [ ] tools/bcftools/bcftools_query.xml
    • [ ] ERROR: Param input [tsv] is only used in a change_format tag
  • [ ] tools/bedtools/closestBed.xml
    • [ ] ERROR: Param input [fu] not found in command or configfiles.
    • [ ] ERROR: Param input [fd] not found in command or configfiles.
    • [ ] ERROR: Param input [fu] not found in command or configfiles.
    • [ ] ERROR: Param input [fd] not found in command or configfiles.
    • [ ] ERROR: Param input [fu] not found in command or configfiles.
    • [ ] ERROR: Param input [fd] not found in command or configfiles.
  • [ ] tools/cat/cat_contigs.xml
    • [ ] WARNING: Param input [seqtype] found in output actions, better use extended metadata.
    • [ ] WARNING: Param input [seqtype] found in a label attribute, this is discouraged.
    • [ ] WARNING: Param input [sum_titles] only found in output actions, better use extended metadata.
    • [ ] WARNING: Param input [bin_col] only found in output actions, better use extended metadata.
  • [ ] tools/cat/cat_bins.xml
    • [ ] WARNING: Param input [seqtype] found in output actions, better use extended metadata.
    • [ ] WARNING: Param input [seqtype] found in a label attribute, this is discouraged.
    • [ ] WARNING: Param input [sum_titles] only found in output actions, better use extended metadata.
    • [ ] WARNING: Param input [bin_col] only found in output actions, better use extended metadata.
  • [ ] tools/disco/disco.xml
    • [ ] ERROR: Param input [PrintScaffolds] not found in command or configfiles.
  • [ ] tools/egsea/egsea.xml
    • [ ] ERROR: Param input [non_commercial_use] not found in command or configfiles.
  • [ ] tools/emboss_5/emboss_epestfind.xml
    • [ ] ERROR: Param input [threshold] not found in command or configfiles.
  • [ ] tools/emboss_5/emboss_textsearch.xml
    • [ ] ERROR: Param input [casesensitive] not found in command or configfiles.
  • [ ] tools/gatk4/gatk4_Mutect2.xml
    • [ ] WARNING: Param input [exclude_intervals] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [exclude_intervals] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [intervals] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [intervals] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [seqdict_sequence] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [seqdict_sequence] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_exclusion_padding] only used in a template macro, use a macro token instead.
    • [ ] WARNING: Param input [interval_padding] only used in a template macro, use a macro token instead.
  • [ ] tools/genehunter_modscore/genehunter_modscore.xml
    • [ ] ERROR: Param input [extra_alg_mem] not found in command or configfiles.
  • [ ] tools/ggplot2/ggplot2_heatmap.xml
    • [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
    • [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
    • [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
    • [ ] ERROR: Param input [sample_name_orientation] not found in command or configfiles.
  • [ ] tools/ggplot2/ggplot_violin.xml
    • [ ] ERROR: Param input [xaxismin] not found in command or configfiles.
    • [ ] ERROR: Param input [xaxismax] not found in command or configfiles.
  • [ ] tools/hicexplorer/hicConvertFormat.xml
    • [ ] ERROR: Param input [storeAppliedCorrection] not found in command or configfiles.
    • [ ] ERROR: Param input [enforceInteger] not found in command or configfiles.
    • [ ] ERROR: Param input [storeAppliedCorrection] not found in command or configfiles.
    • [ ] ERROR: Param input [enforceInteger] not found in command or configfiles.
  • [ ] tools/homer/homer_findMotifsGenome.xml
    • [ ] ERROR: Param input [olen] not found in command or configfiles.
  • [ ] tools/iqtree/iqtree.xml
    • [ ] ERROR: Param input [seqtype] not found in command or configfiles.
  • [ ] tools/maker/maker.xml
    • [ ] ERROR: Param input [license_agreement] not found in command or configfiles.
  • [ ] tools/metaphlan/metaphlan.xml
    • [ ] ERROR: Param input [stat] not found in command or configfiles.
  • [ ] tools/miniprot/miniprot.xml
    • [ ] ERROR: Param input [query_batch_size] not found in command or configfiles.
  • [ ] tools/mothur/classify.tree.xml
    • [ ] ERROR: Param input [cutoff] not found in command or configfiles.
  • [ ] tools/mothur/cluster.xml
    • [ ] ERROR: Param input [fasta] not found in command or configfiles.
    • [ ] ERROR: Param input [fasta] not found in command or configfiles.
    • [ ] ERROR: Param input [fasta] not found in command or configfiles.
  • [ ] tools/mothur/chimera.ccode.xml
    • [ ] ERROR: Param input [mask] not found in command or configfiles.
  • [ ] tools/ncbi_datasets/datasets_gene.xml
    • [ ] ERROR: Param input [decompress] not found in command or configfiles.
  • [ ] tools/necat/necat.xml
    • [ ] ERROR: Param input [min_length] not found in command or configfiles.
    • [ ] ERROR: Param input [max_length] not found in command or configfiles.
    • [ ] ERROR: Param input [min_identity] not found in command or configfiles.
    • [ ] ERROR: Param input [min_aligned_length] not found in command or configfiles.
    • [ ] ERROR: Param input [max_overhang] not found in command or configfiles.
    • [ ] ERROR: Param input [min_coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [max_coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [max_diff_coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [coverage_discard] not found in command or configfiles.
    • [ ] ERROR: Param input [bestn] not found in command or configfiles.
    • [ ] ERROR: Param input [coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [identity_global_deviation1] not found in command or configfiles.
    • [ ] ERROR: Param input [identity_global_deviation2] not found in command or configfiles.
    • [ ] ERROR: Param input [overhang_global_deviation1] not found in command or configfiles.
    • [ ] ERROR: Param input [overhang_global_deviation2] not found in command or configfiles.
    • [ ] ERROR: Param input [identity_local_deviation1] not found in command or configfiles.
    • [ ] ERROR: Param input [identity_local_deviation2] not found in command or configfiles.
    • [ ] ERROR: Param input [overhang_local_deviation1] not found in command or configfiles.
    • [ ] ERROR: Param input [overhang_local_deviation2] not found in command or configfiles.
    • [ ] ERROR: Param input [identity_local_condition] not found in command or configfiles.
    • [ ] ERROR: Param input [local_low_coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [min_length] not found in command or configfiles.
    • [ ] ERROR: Param input [min_identity] not found in command or configfiles.
    • [ ] ERROR: Param input [min_contig_length] not found in command or configfiles.
    • [ ] ERROR: Param input [max_spur_length] not found in command or configfiles.
    • [ ] ERROR: Param input [select_branch] not found in command or configfiles.
    • [ ] ERROR: Param input [read_min_length] not found in command or configfiles.
    • [ ] ERROR: Param input [ctg_min_length] not found in command or configfiles.
    • [ ] ERROR: Param input [ctg2ctg_min_identity] not found in command or configfiles.
    • [ ] ERROR: Param input [ctg2ctg_max_overhang] not found in command or configfiles.
    • [ ] ERROR: Param input [ctg2ctg_min_aligned_length] not found in command or configfiles.
    • [ ] ERROR: Param input [read2ctg_min_identity] not found in command or configfiles.
    • [ ] ERROR: Param input [read2ctg_max_overhang] not found in command or configfiles.
    • [ ] ERROR: Param input [read2ctg_min_aligned_length] not found in command or configfiles.
    • [ ] ERROR: Param input [read2ctg_min_coverage] not found in command or configfiles.
    • [ ] ERROR: Param input [min_contig_length] not found in command or configfiles.
    • [ ] ERROR: Param input [window_size] not found in command or configfiles.
    • [ ] ERROR: Param input [select_branch] not found in command or configfiles.
  • [ ] tools/polypolish/polypolish.xml
    • [ ] ERROR: Param input [min_depth] not found in command or configfiles.
    • [ ] ERROR: Param input [fraction_invalid] not found in command or configfiles.
    • [ ] ERROR: Param input [max_errors] not found in command or configfiles.
    • [ ] ERROR: Param input [fraction_valid] not found in command or configfiles.
  • [ ] tools/schicexplorer/scHicPlotClusterProfiles.xml
    • [ ] ERROR: Param input [maximalDistance] not found in command or configfiles.
  • [ ] tools/schicexplorer/scHicConsensusMatrices.xml
    • [ ] ERROR: Param input [chromosomes] not found in command or configfiles.

For my own reference

planemo ci_find_repos --exclude packages --exclude deprecated --exclude_from .tt_skip --output repository_list.txt
while read line
do                          
    >&2 echo $line; planemo lint  $line --skip help,tests,inputs,output,general,command,citations,tool_xsd,xml_order,stdio
done < repository_list.txt > lint.tx
cat lint.txt | grep -v Applying | egrep "ERROR|WARNING" -B 1 | grep -v '\-\-'| sed 's@Linting tool /home/berntm/projects/tools-iuc/@- [ ] @; s/^\.\./  - [ ]/'

bernt-matthias avatar Dec 09 '22 18:12 bernt-matthias