galaxy Fix DataToolParameters accepting input with arbitrary datatype in tests

In the first place this PR added a test case (test/functional/tools/bam_input_for_sam.xml) showing that in tests (e.g. with planemo test) data parameters of tools accept input of arbitrary datatypes. The test shows this for a sam input which gets bam and bed in the two tests (the bam case might be interpreted as a case of a missing converter as @mvdbeek pointed out below).

Note that in the UI this works as expected, i.e. only sam is accepted.

The main fix is in lines:

https://github.com/galaxyproject/galaxy/blob/570686ee62937aaf53ce5a12ea959260a23a875f/lib/galaxy/tools/parameters/basic.py#L1961-L1966

Previously the loop containing this code checked only if an implicit conversion is possible/necessary for one/more inputs. Now an exception is raised if the datasets do not match the datatype required by the parameter. An additional necessary change is that dataset_matcher.hda_match now uses ensure_visible=False. The consequence is that also hidden datasets are considered. This was necessary since otherwise collection input does not get a match if the contained data sets are hidden (makes me wonder if implicit conversion ever worked for this case .. edit: seems so, but I don't understand why). Not sure about other possible side effects.

This required a quite few changes to tests where datatypes of inputs were not specified or the wrong datatype was shown. While fixing I realized that at different places ext, extension, and file_type are used .. would be nice if this would become consistent...

TODO

[ ] the same in necessary for DataCollectionToolParameter, somewhere here https://github.com/galaxyproject/galaxy/blob/433874d112d48a11339d3fbe2a5e65531b9e4cd6/lib/galaxy/tools/parameters/basic.py#L2168
- [ ] add a test
- [ ] add fix
[ ] adapt Exception text. Since a mismatch can alslo happen if an options filter kicks out datasets with wrong / missing dbkey

How to test the changes?

(Select all options that apply)

[x] I've included appropriate automated tests.
[ ] This is a refactoring of components with existing test coverage.
[ ] Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

[x] I agree to license these contributions under Galaxy's current license.
[x] I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to [email protected].

May 29 '21 15:05 bernt-matthias

sam should accept bam, and we already have tests for that in sam_to_unsorted_bam.xml. The UI should be accepting bam, can you let me know how to reproduce that ?

May 31 '21 13:05 mvdbeek

I tried with planemo serve.

sam should accept bam

But then a converter should run, or? You can see in this test that no converter runs.

May 31 '21 13:05 bernt-matthias

I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.

May 31 '21 13:05 bernt-matthias

I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.

You're right, we only have tests in the other direction and we don't have a bam to sam converter, oddly enough ... not sure how that is possible

May 31 '21 14:05 mvdbeek

Seems to me that there are two things to do:

find out why the test runs and does not complain about datatype mismatch
add a converter

I could need a hint for the first point. Converter seems easy.

May 31 '21 17:05 bernt-matthias

find out why the test runs and does not complain about datatype mismatch

Maybe try targeting a running instance with the test ... if that fails maybe we're setting up some weird development variables in the testing framework ?

May 31 '21 18:05 mvdbeek

I started a Galaxy instance with GALAXY_RUN_WITH_TEST_TOOLS=1 ./run.sh and executed the test with galaxy-tool-test -t bam4sam --galaxy-url http://localhost:8080 --key APIKEY

Has exactly the same result:

Executing test 'bam4sam/1.0-0'
Test 'bam4sam/1.0-0' failed
Traceback (most recent call last):
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/script.py", line 281, in run_test
    verify_tool(
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1050, in verify_tool
    raise e
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1046, in verify_tool
    job_stdio = _verify_outputs(testdef, test_history, jobs, tool_id, data_list, data_collection_list, galaxy_interactor, quiet=quiet)
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1222, in _verify_outputs
    raise JobOutputsError(found_exceptions, job_stdio)
galaxy.tool_util.verify.interactor.JobOutputsError: Output outfile:  different than expected
Output file did not contain expected text 'sam' (output 'unsorted.bam
')
Report written to 'standard output'
Passed tool tests (0): []
Failed tool tests (1): ['bam4sam/1.0-0']
Skipped tool tests (0): []
Errored tool tests (0): []
Total tool test time: 0:00:21.492503

Any suggestion where to look, e.g. where the check for the input data types happens?

Jun 02 '21 11:06 bernt-matthias

Tried to find out where the input datatype is checked .. without success. I also tried to set a nonsense ftype="bed". This apparently leads to an infinite loop.

Jun 06 '21 19:06 bernt-matthias

@mvdbeek do you think this (https://github.com/galaxyproject/galaxy/pull/12073/commits/8285bd610d6169b5c3396da0169afd0cf3597233) goes in a useful direction. Seems that it makes quite a lot of tests fail (a fix for one of them is here: https://github.com/galaxyproject/galaxy/pull/12073/commits/76290548ea63c16ce88db995fc4995a9effdceb6).

Jun 28 '21 15:06 bernt-matthias

Nearly there (hopefully): only 1 selenium and 1 api test failing.

I also ran this on the IUC repo: https://github.com/galaxyproject/tools-iuc/issues/4015#issuecomment-937955316: 291 failures + 3 errors. Uff. I checked a few of them .. seems that they are caused by tests that miss the ftype attribute in input params and the sniffer fails. Seems legit .. but also quite some work to fix this.

The planemo html output seems quite useless in these cases (no output at all) and one needs to check the console output which contains something like this

======================================================================
ERROR: ( aegean_locuspocus ) > Test-1
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 98, in test_tool
    self.do_it(tool_version=tool_version, test_index=test_index)
  File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 35, in do_it
    verify_tool(tool_id, self.galaxy_interactor, resource_parameters=resource_parameters, test_index=test_index, tool_version=tool_version, register_job_data=register_job_data)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1110, in verify_tool
    raise e
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1103, in verify_tool
    tool_response = galaxy_interactor.run_tool(testdef, test_history, resource_parameters=resource_parameters)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 468, in run_tool
    submit_response_object = ensure_tool_run_response_okay(submit_response, "execute tool", inputs_tree)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 811, in ensure_tool_run_response_okay
    raise RunToolException(message, inputs, dynamic_param_error=dynamic_param_error)
galaxy.tool_util.verify.interactor.RunToolException: parameter 'genesgff3': Data set with invalid datatype supplied to input dataset parameter
...

Would be helpful to have this in the html output.

Oct 07 '21 21:10 bernt-matthias

and the sniffer fails

we might also use the opportunity and try to improve the sniffers.

Oct 07 '21 21:10 bernt-matthias

Implementation for collections is finished. Surprisingly no further api/selenium tests failed. Maybe test coverage for this is lower.

One problem is that with this change we can not test params anymore that have a format that is not uploadable (and has no sniffer), e.g. https://github.com/galaxyproject/tools-iuc/blob/a04f64b68f22ea44dbeb0423f969a7ced4736249/tools/genehunter_modscore/genehunter_modscore.xml#L243

Oct 10 '21 19:10 bernt-matthias

One problem is that with this change we can not test params any more that have a format that is not uploadable (and has no sniffer)

That looks like an advantage to me, in that we find more issues in tools earlier?

Oct 11 '21 08:10 nsoranzo

One problem is that with this change we can not test params any more that have a format that is not uploadable (and has no sniffer)

That looks like an advantage to me, in that we find more issues in tools earlier?

Thanks for the feedback. The problem is that tools that have an input of a datatype that:

has no sniffer and
NOT has display_in_upload="true"

can not be tested anymore sine the upload step will fail because an unuploadable datatype was chosen (because of the validation of the select in upload.xml) .. see also https://gitter.im/galaxyproject/dev?at=6163ebb4f2cedf67f97b0d1a

Oct 11 '21 09:10 bernt-matthias

Very nice, I like it, just some minor edits needed.

:)

But just to clarify, this will break (planemo) tool testing if we do not get https://github.com/galaxyproject/galaxy/pull/12709 working... If I'm not completely wrong. If I remember correctly, then the problem are inputs with data types that are not uploadable (show_in_upload).

I guess another test against IUC would be good.

Jan 14 '22 18:01 bernt-matthias

But just to clarify, this will break (planemo) tool testing if we do not get #12709 working

Yeah, I guess this would be a big change. I'm not 100% confident that this doesn't come with issues ... I'll think about it.

Jan 14 '22 19:01 mvdbeek

Hey @mvdbeek @jmchilton and @nsoranzo does someone has an idea what the source of the following error is that popped up after rebasing:

No match for discriminator 'type' and value 'hdca' (allowed values: 'hdas', 'library_folder', 'library')

I'm quite lost here.

I would at least like to keep this PR up to date (also because it seems to solve https://github.com/galaxyproject/galaxy/pull/13812 as well) because I think the PR is a good idea (just have a look at the large number of linked PRs) .. even if I think it's not a good idea to merge until we have a solution to the problem with uploading "unuploadable" data in (planemo) tool testing (I remember a comment by @jmchilton that he was surprised that upload.xml is used and not another mechanism .. which I forgot)?

Sep 18 '22 11:09 bernt-matthias

No match for discriminator 'type' and value 'hdca' (allowed values: 'hdas', 'library_folder', 'library')

This sounds like a Pydantic validation error. Looks like there is a missing HdcaDestination in the union defined here:

https://github.com/galaxyproject/galaxy/blob/8bcdff6f3acbef88ffc9cff5d8de5f18f4fd09f6/lib/galaxy/schema/fetch_data.py#L227

Sep 18 '22 13:09 davelopez

@davelopez thanks for the suggestion. I added the change, but it appears to me that this introduces even more problems.

Sep 19 '22 06:09 bernt-matthias

This PR should be merged with https://github.com/galaxyproject/galaxy/pull/12709

My suggestion for the upload problem would be to deprecate the display_in_upload attribute of the datatypes (inspired by discussion with @nsoranzo during EGD22 .. If I got him right :). For the following reasons:

the datatypes with a sniffer and display_in_upload="false" can be uploaded (tested for the cm datatype)
datatypes can be changed also to unuploadable datatypes, i.e. datatypes without a sniffer and display_in_upload="false" can be uploaded if an uploadable datatype/auto is chosen in the upload dialog and the datatype is manually changed to the desired (unoploadable) datatype

So display_in_upload seems a bit useless.

But: maybe this worked until approx 18.x https://github.com/galaxyproject/galaxy/issues/6719 Potentially related https://github.com/galaxyproject/galaxy/pull/9135

Oct 06 '22 22:10 bernt-matthias

My suggestion for the upload problem would be to deprecate the display_in_upload attribute of the datatypes (inspired by discussion with @nsoranzo during EGD22 .. If I got him right :). For the following reasons:

the datatypes with a sniffer and display_in_upload="false" can be uploaded (tested for the cm datatype)

datatypes can be changed also to unuploadable datatypes, i.e. datatypes without a sniffer and display_in_upload="false" can be uploaded if an uploadable datatype/auto is chosen in the upload dialog and the datatype is manually changed to the desired (unoploadable) datatype

So display_in_upload seems a bit useless.

Summary of the discussion at the latest Backend WG meeting: it was agreed that display_in_upload is not very useful currently, but some were still concerned about datatypes that we shouldn't allow users to upload. The suggested way forward is:

Set display_in_upload="true" for all datatypes needed as input for tool testing that are deemed safe
For the datatypes that are not considered safe, this should be documented with a comment in lib/galaxy/config/sample/datatypes_conf.xml.sample
Improve checks for unsafe datatypes done during upload and metadata editing of datasets, and extend test coverage

I hope I've not forgot/misunderstood important details from the meeting, feel free to correct!

Oct 12 '22 20:10 nsoranzo