SqueezeMeta icon indicating copy to clipboard operation
SqueezeMeta copied to clipboard

Is it possible to process/analyse MetaEuk results with SqueezeMeta?

Open Tamtatatam opened this issue 2 years ago • 5 comments

Hi I am working on a dataset containing mostly Eukaryota and used the MetaEuk pipeline to predict introns and exons. I wonder if it is possible to use use Squeezemeta for mapping of reads, calculating abundance and visualising the results. I understand that this would mean skipping the assembly and the binning and (probably) providing the MetaEuk results as the assembly. Has someone tried this and would this work?

Thanks!

Tamtatatam avatar Aug 15 '22 08:08 Tamtatatam

Hi!

It is possible and easy, but there is a caveat.

Assuming that you already have an assembly with the contigs you want to analyze, you can just add the following when calling SqueezeMeta. -a your_assembly.fasta --nobins This will skip assembly, and use the provided fasta file instead. This will also skip binning.

The caveat is that we use Prodigal for ORF prediction, and this will not work so great with eukaryotic sequences. You can add the -d flag when running SqueezeMeta, this will improve annotation over the regions not predicted by Prodigal.

In theory, you could also run another ORF predictor and use that to override the results (gff, fna and faa files) that would have been produced normally when running prodigal SqueezeMeta. If done correctly, you should be then able to restart the pipeline from step 4 and the pipeline should run to completion. However, we have never tried it before, so there could be extra problems along the way.

In any case, if you only want to estimate the abundance of your contigs, annotation wouldn't matter so much.

fpusan avatar Aug 15 '22 08:08 fpusan

I made a mistake in the command I recommended above. It would be -extassembly your_assembly.fasta --nobins -d

fpusan avatar Aug 15 '22 09:08 fpusan

Thanks! I will give it a try. MetaEuk outputs fasta and gff files for the predicted genes, so there is no need to run prodigal - however, I know that many pieplines struggle with the intron/exon information. Will -extassembly your_assembly.fasta --nobins -d still run prodigal or is it possible to skip that and go straight to mapping and abundance estimation?

Tamtatatam avatar Aug 15 '22 09:08 Tamtatatam

You can not skip prodigal but it shouldn't affect the abundance estimation step

fpusan avatar Aug 15 '22 09:08 fpusan

You also have available the sqm_mapper.pl script, that maps reads to a reference a performs abundance estimation. Probably is better for your purposes. Best, J

jtamames avatar Aug 22 '22 08:08 jtamames

Closing due to lack of activity, feel free to reopen

fpusan avatar Oct 26 '22 11:10 fpusan