tools-iuc icon indicating copy to clipboard operation
tools-iuc copied to clipboard

MITOS: try to speed up tool form

Open bernt-matthias opened this issue 2 years ago • 7 comments

The options filter seems to be evaluated on tool load https://github.com/galaxyproject/tools-iuc/blob/b991ebbd9f59ffff1b3eae09f65b59b02ad7cf8e/tools/mitos/mitos.xml#L47 which slows down tool loading for large histories / collections

Maybe replace by validator

bernt-matthias avatar Jun 23 '22 12:06 bernt-matthias

I'd maybe also check the performance on the Galaxy side, I don't think there's an inherent reason why this has to be slow.

mvdbeek avatar Jun 23 '22 13:06 mvdbeek

Jep, seems a good opportunity. Any suggestions which part(s) of the code could be the culprit?

bernt-matthias avatar Jun 24 '22 08:06 bernt-matthias

I'd start by profiling https://github.com/mvdbeek/galaxy/blob/c0fc0a853edb102a685f1343efc04dace4b044aa/lib/galaxy/webapps/galaxy/api/tools.py#L203

mvdbeek avatar Jun 24 '22 10:06 mvdbeek

Added some debug statements to my best guesses and loaded the tool with an active history containing

  • 1 fasta including ~8000 sequences
  • 1 collection with ~8000 fasta datasets each containing 1 sequence

On loading I see:

galaxy.tools.parameters.basic ERROR 2022-06-24 12:00:08,723 [pN:main.1,p:416895,tN:WSGI_2] DataToolParameter.from_json START
galaxy.tools.parameters.basic ERROR 2022-06-24 12:00:08,725 [pN:main.1,p:416895,tN:WSGI_2] BaseDataToolParameter.get_initial_value START
galaxy.tools.parameters.basic ERROR 2022-06-24 12:00:08,746 [pN:main.1,p:416895,tN:WSGI_2] DataToolParameter.to_dict START
galaxy.tools.parameters.basic ERROR 2022-06-24 12:00:08,754 [pN:main.1,p:416895,tN:WSGI_2] BaseDataToolParameter.get_initial_value START
...
galaxy.tools.parameters.basic ERROR 2022-06-24 12:06:42,827 [pN:main.1,p:416895,tN:WSGI_2] DataToolParameter.to_dict END
  • so the tool needs 6 min to load :(
  • some END debug statements are missing (but maybe I forgot some places to add debug statements)
  • seems odd that the functions are called twice, or?

Also strange that the process starts again after the tool form loaded (and the above messages are seen in the log).

This was tested on https://github.com/galaxyproject/galaxy/commit/83d110bef72608ae7030dabbd27781761edd4d01

Never did the profiling. Will see if I can work along the docs

bernt-matthias avatar Jun 24 '22 10:06 bernt-matthias

So ... DataToolParameter.to_dict function uses a DatasetMatcherFactory which always uses DatasetCollectionMatcher

https://github.com/galaxyproject/galaxy/blob/60d851445c3f3792543cce96204fc12eea86befd/lib/galaxy/tools/parameters/basic.py#L2226

https://github.com/galaxyproject/galaxy/blob/60d851445c3f3792543cce96204fc12eea86befd/lib/galaxy/tools/parameters/dataset_matcher.py#L83

The runtime of to_dict seems to go to close to zero if SummaryDatasetCollectionMatcher would be used (I hardcoded it in a single experiment), but for this we would need to initialize the matcher factory using the tool parameter.

No idea how to get this into the the call of the to_dict function .. trans seems to contain no reference to the tool, or?

bernt-matthias avatar Jun 24 '22 12:06 bernt-matthias

Parts of my previous message were wrong:

We would get SummaryDatasetCollectionMatcher, but since the parameter has dynamic options we get the slow DatasetCollectionMatcher here

https://github.com/galaxyproject/galaxy/blob/60d851445c3f3792543cce96204fc12eea86befd/lib/galaxy/tools/parameters/dataset_matcher.py#L84

because of

https://github.com/galaxyproject/galaxy/blob/60d851445c3f3792543cce96204fc12eea86befd/lib/galaxy/tools/parameters/dataset_matcher.py#L43

bernt-matthias avatar Jun 24 '22 12:06 bernt-matthias

Maybe that's why options_filter_attribute is marked as deprecated: https://docs.galaxyproject.org/en/master/dev/schema.html#other-ways-to-dynamically-generate-options

bernt-matthias avatar Jun 24 '22 12:06 bernt-matthias

Fixed in https://github.com/galaxyproject/tools-iuc/pull/5152

bernt-matthias avatar Feb 26 '23 10:02 bernt-matthias