iwc
iwc copied to clipboard
Functional annotation of protein sequences - Workflow
Hello, I'd like to suggest this new protein sequence annotation workflow, using eggNOG Mapper and Interproscan.
At the same time, I'd like to tell you about a problem I'm having with interproscan testing. I get this error
Failed to find output [interproscan xml] in invocation outputs [{'eggNOG Mapper annotations': {'src': 'hda', 'id': '4bafbd75dc760dfd', 'workflow_step_id': '6a2bc09b040f62c5'}, 'eggNOG Mapper seed_orthologs': {'src': 'hda', 'id': '2e6846bf7c441bd6', 'workflow_step_id': '6a2bc09b040f62c5'}}]
Have a nice day!
@rlibouba we need a .dockstore.yml file here.
Ok, now the CI is properly running :)
Hey @rlibouba, this is the error message:
The reference data is coming from cvmfs, do you know where we can get 5.59-91.0
from ?
Hi @mvdbeek, thanks for your feedback. Sorry for my late reply.
Checking with @abretaud , it should be linked to the data manager. Do you think we should use idc (https://github.com/galaxyproject/idc) to manage the problem?
Yep, it uses this DM: https://github.com/galaxyproject/tools-iuc/tree/main/data_managers/data_manager_interproscan It's a few tens of Gb IIRC, if it could be managed by IDC it would great, but not sure if it's ready to handle non-genome data
I think the problem was that we don't have a great way to publish large datasets, but we can always rsync this onto cvmfs from a site that has the data available
Ok, the DM mostly downloads a big archive but also makes a few file indexing and .properties file writing