funcscan Allow input of pre-annotated ORFs

Description of feature

Originating from e.g. metatdenovo and mag pipelines. This would be very helpful for me as I run those pipelines and would like to keep the ORF ids for all analyses.

Apr 04 '23 09:04 BenPonBiobrain

Some development observations:

AMP workflow takes FAA (for amplify, hmmsearch, ampir, ampcombi)
ARG workflow takes FAA (for deeparg)
BGC workflow takes GFF/FAA/GBK (faa: hmmsearch; gff: antismash (prodigal); gbk: antismash (prokka, bakta))

I would propose that we have two additional columns: e.g. amino_acid_fasta and feature_file. The latter will except GFF or GBK files, but if the user supplies the wrong one to antismash that's there problem.

Thoughts @nf-core/funcscan ?

Apr 26 '23 07:04 jfy133

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

Apr 26 '23 08:04 erikrikarddaniel

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

No fasta would still be required as some tools still require both

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool :grimacing:

Apr 28 '23 06:04 jfy133

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

No fasta would still be required as some tools still require both

The alternative would perhaps be to skip those tools when no fasta is available. The more I think about this with metatdenovo and magmap output as the input to this pipeline, the less of a problem with providing both contig fasta and ORF amino acid fasta (or ORF nucleotide fasta for that matter) I see, so I'm fine with this!

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool grimacing

:+1:

Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.

Apr 28 '23 07:04 erikrikarddaniel

Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.

That would be wonderful @erikrikarddaniel ! :star_struck: when/if you get time feel free to continue on my PR here if I dont' make progress either!

May 08 '23 09:05 jfy133

Just to let you know, @jfy133: It's been a month, and I still don't see when I'll have time for this. If you don't either, no problem, I'll get to this in due time.

Jun 13 '23 13:06 erikrikarddaniel

No worries! I'm slowly picking away at it on the PR above, so just jump in when you can :)

Jun 13 '23 14:06 jfy133

Hopefully done in https://github.com/nf-core/funcscan/pull/381!

Jun 03 '24 12:06 jfy133

Sorry this took such a long time @BenPonBiobrain :sweat_smile: a baby happened in the middle :grimacing:

Jun 03 '24 12:06 jfy133

funcscan funcscan copied to clipboard

Allow input of pre-annotated ORFs

Description of feature

funcscan
funcscan copied to clipboard