funcscan
funcscan copied to clipboard
Allow input of pre-annotated ORFs
Description of feature
Originating from e.g. metatdenovo and mag pipelines. This would be very helpful for me as I run those pipelines and would like to keep the ORF ids for all analyses.
Some development observations:
- AMP workflow takes FAA (for amplify, hmmsearch, ampir, ampcombi)
- ARG workflow takes FAA (for deeparg)
- BGC workflow takes GFF/FAA/GBK (faa: hmmsearch; gff: antismash (prodigal); gbk: antismash (prokka, bakta))
I would propose that we have two additional columns: e.g. amino_acid_fasta
and feature_file
. The latter will except GFF or GBK files, but if the user supplies the wrong one to antismash that's there problem.
Thoughts @nf-core/funcscan ?
These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?
Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)
These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?
No fasta would still be required as some tools still require both
Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)
We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool :grimacing:
These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?
No fasta would still be required as some tools still require both
The alternative would perhaps be to skip those tools when no fasta is available. The more I think about this with metatdenovo and magmap output as the input to this pipeline, the less of a problem with providing both contig fasta and ORF amino acid fasta (or ORF nucleotide fasta for that matter) I see, so I'm fine with this!
Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)
We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool grimacing
:+1:
Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.
Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.
That would be wonderful @erikrikarddaniel ! :star_struck: when/if you get time feel free to continue on my PR here if I dont' make progress either!
Just to let you know, @jfy133: It's been a month, and I still don't see when I'll have time for this. If you don't either, no problem, I'll get to this in due time.
No worries! I'm slowly picking away at it on the PR above, so just jump in when you can :)
Hopefully done in https://github.com/nf-core/funcscan/pull/381!
Sorry this took such a long time @BenPonBiobrain :sweat_smile: a baby happened in the middle :grimacing: