tools-iuc icon indicating copy to clipboard operation
tools-iuc copied to clipboard

Add format feature for downloading multiple files with PyEGA

Open JasperO98 opened this issue 3 years ago • 5 comments

FOR CONTRIBUTOR:

  • [x] - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • [x] - License permits unrestricted use (educational + commercial)
  • [ ] - This PR adds a new tool or tool collection
  • [x] - This PR updates an existing tool or tool collection
  • [ ] - This PR does something else (explain below)

I have added a parameter to provide the format of the download files when downloading multiple files with pyega. Before everything would be interpreted as a 'data' format. I tested if auto_format="true" in a collection is possible but it would raise an error. For now I've added the data types I see the most when working on Galaxy, but please add any that you think are important. I think adding all data types would be very overkill.

JasperO98 avatar Oct 12 '22 14:10 JasperO98

raise an error

What is the error?

bernt-matthias avatar Oct 12 '22 16:10 bernt-matthias

I meant that when I do planemo lint it raises the following error. When I add auto_format="true" to the collection element:

ERROR: Invalid XML found in file: pyega3.xml. Errors [/mnt/e/CINECA/tools-iuc/tools/pyega3/tmpn8thf0ni:151:0:ERROR:SCHEMASV:SCHEMAV_CVC_COMPLEX_TYPE_3_2_1: Element 'collection', attribute 'auto_format': The attribute 'auto_format' is not allowed.

When I add auto_format="true" to the discover_datasets element:

.. ERROR: Invalid XML found in file: pyega3.xml. Errors [/mnt/e/CINECA/tools-iuc/tools/pyega3/tmpkqql5p5r:153:0:ERROR:SCHEMASV:SCHEMAV_CVC_COMPLEX_TYPE_3_2_1: Element 'discover_datasets', attribute 'auto_format': The attribute 'auto_format' is not allowed.]

So basically there is no way to auto format files in a collection, right?

JasperO98 avatar Oct 13 '22 08:10 JasperO98

You are right, this feature is missing https://github.com/galaxyproject/galaxy/pull/11754

But actually pattern="__designation_and_ext__" should do the trick. What are the file names where Galaxy detects data.

bernt-matthias avatar Oct 13 '22 12:10 bernt-matthias

I'm only sure for vcf.gz files which are interpreted as gz files instead of vcf_bgzip. An example of a filename: Case5_F.17.g.vcf.gz maybe it is because of the .g?

JasperO98 avatar Oct 13 '22 12:10 JasperO98

I think the tool would work as is, if you would add a command to the command block which renames all vcf.gz to vcf_bgzip.

The same should be done for all files that we may get from ega where the file extension does not match the extension of the cordoned galaxy datatype.

bernt-matthias avatar Oct 13 '22 14:10 bernt-matthias

Workflow keeps failing because of time out. Timed out after 900.25 seconds waiting on tool test run. I guess EGA is just slow sometimes, since the test finishes correctly locally.

JasperO98 avatar Oct 19 '22 13:10 JasperO98

@bernt-matthias is it possible increase the time-out threshold or is this error raised for a different reason?

JasperO98 avatar Oct 24 '22 09:10 JasperO98

@bernt-matthias is it possible increase the time-out threshold or is this error raised for a different reason?

This timeout applies to all tool tests. If we change it then the change applies to all tools.

bernt-matthias avatar Nov 07 '22 12:11 bernt-matthias