Fix DataToolParameters accepting input with arbitrary datatype in tests
In the first place this PR added a test case (test/functional/tools/bam_input_for_sam.xml) showing that in tests (e.g. with planemo test) data parameters of tools accept input of arbitrary datatypes. The test shows this for a sam input which gets bam and bed in the two tests (the bam case might be interpreted as a case of a missing converter as @mvdbeek pointed out below).
Note that in the UI this works as expected, i.e. only sam is accepted.
The main fix is in lines:
https://github.com/galaxyproject/galaxy/blob/570686ee62937aaf53ce5a12ea959260a23a875f/lib/galaxy/tools/parameters/basic.py#L1961-L1966
Previously the loop containing this code checked only if an implicit conversion is possible/necessary for one/more inputs. Now an exception is raised if the datasets do not match the datatype required by the parameter.
An additional necessary change is that dataset_matcher.hda_match now uses ensure_visible=False. The consequence is that also hidden datasets are considered. This was necessary since otherwise collection input does not get a match if the contained data sets are hidden (makes me wonder if implicit conversion ever worked for this case .. edit: seems so, but I don't understand why). Not sure about other possible side effects.
This required a quite few changes to tests where datatypes of inputs were not specified or the wrong datatype was shown. While fixing I realized that at different places ext, extension, and file_type are used .. would be nice if this would become consistent...
TODO
- [ ] the same in necessary for
DataCollectionToolParameter, somewhere here https://github.com/galaxyproject/galaxy/blob/433874d112d48a11339d3fbe2a5e65531b9e4cd6/lib/galaxy/tools/parameters/basic.py#L2168- [ ] add a test
- [ ] add fix
- [ ] adapt Exception text. Since a mismatch can alslo happen if an options filter kicks out datasets with wrong / missing dbkey
How to test the changes?
(Select all options that apply)
- [x] I've included appropriate automated tests.
- [ ] This is a refactoring of components with existing test coverage.
- [ ] Instructions for manual testing are as follows:
- [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]
License
- [x] I agree to license these contributions under Galaxy's current license.
- [x] I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to [email protected].
sam should accept bam, and we already have tests for that in sam_to_unsorted_bam.xml. The UI should be accepting bam, can you let me know how to reproduce that ?
I tried with planemo serve.
sam should accept bam
But then a converter should run, or? You can see in this test that no converter runs.
I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.
I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.
You're right, we only have tests in the other direction and we don't have a bam to sam converter, oddly enough ... not sure how that is possible
Seems to me that there are two things to do:
- find out why the test runs and does not complain about datatype mismatch
- add a converter
I could need a hint for the first point. Converter seems easy.
- find out why the test runs and does not complain about datatype mismatch
Maybe try targeting a running instance with the test ... if that fails maybe we're setting up some weird development variables in the testing framework ?
I started a Galaxy instance with GALAXY_RUN_WITH_TEST_TOOLS=1 ./run.sh and executed the test with galaxy-tool-test -t bam4sam --galaxy-url http://localhost:8080 --key APIKEY
Has exactly the same result:
Executing test 'bam4sam/1.0-0'
Test 'bam4sam/1.0-0' failed
Traceback (most recent call last):
File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/script.py", line 281, in run_test
verify_tool(
File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1050, in verify_tool
raise e
File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1046, in verify_tool
job_stdio = _verify_outputs(testdef, test_history, jobs, tool_id, data_list, data_collection_list, galaxy_interactor, quiet=quiet)
File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1222, in _verify_outputs
raise JobOutputsError(found_exceptions, job_stdio)
galaxy.tool_util.verify.interactor.JobOutputsError: Output outfile: different than expected
Output file did not contain expected text 'sam' (output 'unsorted.bam
')
Report written to 'standard output'
Passed tool tests (0): []
Failed tool tests (1): ['bam4sam/1.0-0']
Skipped tool tests (0): []
Errored tool tests (0): []
Total tool test time: 0:00:21.492503
Any suggestion where to look, e.g. where the check for the input data types happens?
Tried to find out where the input datatype is checked .. without success. I also tried to set a nonsense ftype="bed". This apparently leads to an infinite loop.
@mvdbeek do you think this (https://github.com/galaxyproject/galaxy/pull/12073/commits/8285bd610d6169b5c3396da0169afd0cf3597233) goes in a useful direction. Seems that it makes quite a lot of tests fail (a fix for one of them is here: https://github.com/galaxyproject/galaxy/pull/12073/commits/76290548ea63c16ce88db995fc4995a9effdceb6).
Nearly there (hopefully): only 1 selenium and 1 api test failing.
I also ran this on the IUC repo: https://github.com/galaxyproject/tools-iuc/issues/4015#issuecomment-937955316: 291 failures + 3 errors. Uff. I checked a few of them .. seems that they are caused by tests that miss the ftype attribute in input params and the sniffer fails. Seems legit .. but also quite some work to fix this.
The planemo html output seems quite useless in these cases (no output at all) and one needs to check the console output which contains something like this
======================================================================
ERROR: ( aegean_locuspocus ) > Test-1
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 98, in test_tool
self.do_it(tool_version=tool_version, test_index=test_index)
File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 35, in do_it
verify_tool(tool_id, self.galaxy_interactor, resource_parameters=resource_parameters, test_index=test_index, tool_version=tool_version, register_job_data=register_job_data)
File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1110, in verify_tool
raise e
File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1103, in verify_tool
tool_response = galaxy_interactor.run_tool(testdef, test_history, resource_parameters=resource_parameters)
File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 468, in run_tool
submit_response_object = ensure_tool_run_response_okay(submit_response, "execute tool", inputs_tree)
File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 811, in ensure_tool_run_response_okay
raise RunToolException(message, inputs, dynamic_param_error=dynamic_param_error)
galaxy.tool_util.verify.interactor.RunToolException: parameter 'genesgff3': Data set with invalid datatype supplied to input dataset parameter
...
Would be helpful to have this in the html output.
and the sniffer fails
we might also use the opportunity and try to improve the sniffers.
Implementation for collections is finished. Surprisingly no further api/selenium tests failed. Maybe test coverage for this is lower.
One problem is that with this change we can not test params anymore that have a format that is not uploadable (and has no sniffer), e.g. https://github.com/galaxyproject/tools-iuc/blob/a04f64b68f22ea44dbeb0423f969a7ced4736249/tools/genehunter_modscore/genehunter_modscore.xml#L243
One problem is that with this change we can not test params any more that have a format that is not uploadable (and has no sniffer)
That looks like an advantage to me, in that we find more issues in tools earlier?
One problem is that with this change we can not test params any more that have a format that is not uploadable (and has no sniffer)
That looks like an advantage to me, in that we find more issues in tools earlier?
Thanks for the feedback. The problem is that tools that have an input of a datatype that:
- has no sniffer and
- NOT has
display_in_upload="true"
can not be tested anymore sine the upload step will fail because an unuploadable datatype was chosen (because of the validation of the select in upload.xml) .. see also https://gitter.im/galaxyproject/dev?at=6163ebb4f2cedf67f97b0d1a
Very nice, I like it, just some minor edits needed.
:)
But just to clarify, this will break (planemo) tool testing if we do not get https://github.com/galaxyproject/galaxy/pull/12709 working... If I'm not completely wrong. If I remember correctly, then the problem are inputs with data types that are not uploadable (show_in_upload).
I guess another test against IUC would be good.
But just to clarify, this will break (planemo) tool testing if we do not get #12709 working
Yeah, I guess this would be a big change. I'm not 100% confident that this doesn't come with issues ... I'll think about it.
Hey @mvdbeek @jmchilton and @nsoranzo does someone has an idea what the source of the following error is that popped up after rebasing:
No match for discriminator 'type' and value 'hdca' (allowed values: 'hdas', 'library_folder', 'library')
I'm quite lost here.
I would at least like to keep this PR up to date (also because it seems to solve https://github.com/galaxyproject/galaxy/pull/13812 as well) because I think the PR is a good idea (just have a look at the large number of linked PRs) .. even if I think it's not a good idea to merge until we have a solution to the problem with uploading "unuploadable" data in (planemo) tool testing (I remember a comment by @jmchilton that he was surprised that upload.xml is used and not another mechanism .. which I forgot)?
No match for discriminator 'type' and value 'hdca' (allowed values: 'hdas', 'library_folder', 'library')
This sounds like a Pydantic validation error.
Looks like there is a missing HdcaDestination in the union defined here:
https://github.com/galaxyproject/galaxy/blob/8bcdff6f3acbef88ffc9cff5d8de5f18f4fd09f6/lib/galaxy/schema/fetch_data.py#L227
@davelopez thanks for the suggestion. I added the change, but it appears to me that this introduces even more problems.
This PR should be merged with https://github.com/galaxyproject/galaxy/pull/12709
My suggestion for the upload problem would be to deprecate the display_in_upload attribute of the datatypes (inspired by discussion with @nsoranzo during EGD22 .. If I got him right :). For the following reasons:
- the datatypes with a sniffer and
display_in_upload="false"can be uploaded (tested for thecmdatatype) - datatypes can be changed also to unuploadable datatypes, i.e. datatypes without a sniffer and
display_in_upload="false"can be uploaded if an uploadable datatype/auto is chosen in the upload dialog and the datatype is manually changed to the desired (unoploadable) datatype
So display_in_upload seems a bit useless.
But: maybe this worked until approx 18.x https://github.com/galaxyproject/galaxy/issues/6719 Potentially related https://github.com/galaxyproject/galaxy/pull/9135
My suggestion for the upload problem would be to deprecate the
display_in_uploadattribute of the datatypes (inspired by discussion with @nsoranzo during EGD22 .. If I got him right :). For the following reasons:
the datatypes with a sniffer and
display_in_upload="false"can be uploaded (tested for thecmdatatype)datatypes can be changed also to unuploadable datatypes, i.e. datatypes without a sniffer and
display_in_upload="false"can be uploaded if an uploadable datatype/auto is chosen in the upload dialog and the datatype is manually changed to the desired (unoploadable) datatypeSo
display_in_uploadseems a bit useless.
Summary of the discussion at the latest Backend WG meeting: it was agreed that display_in_upload is not very useful currently, but some were still concerned about datatypes that we shouldn't allow users to upload. The suggested way forward is:
- Set
display_in_upload="true"for all datatypes needed as input for tool testing that are deemed safe - For the datatypes that are not considered safe, this should be documented with a comment in
lib/galaxy/config/sample/datatypes_conf.xml.sample - Improve checks for unsafe datatypes done during upload and metadata editing of datasets, and extend test coverage
I hope I've not forgot/misunderstood important details from the meeting, feel free to correct!