phyloFlash icon indicating copy to clipboard operation
phyloFlash copied to clipboard

PhyloFlash aborting while running Emirge

Open Sidduppal opened this issue 2 years ago • 3 comments

Hey, thanks a lot for building such an amazing tool with great documentation. I was using the tool using the following command and it seems to fail while running Emirge. I could not debug the problem, any ideas? PhyloFlash was installed using mamba in a separate conda environment. The log is attached. Command used:

T2R1_B_O_read_dir="/media/bigdrive2/sidd/soil_wgs/data/raw/soil_wgs_reads/T2R1_B_O"
lib="T2R1_B_O"
cpus=20
db_dir="/media/bigdrive1/Databases/phyloFlash_db/138"

phyloFlash.pl \
    -lib $lib \
    -read1 $T2R1_B_O_read_dir/T2R1_B_0_combined_1.fq.gz \
    -read2 $T2R1_B_O_read_dir/T2R1_B_0_combined_2.fq.gz \
    -CPUs $cpus \
    -dbhome $db_dir \
    -log \
    -html \
    -treemap \
    -emirge \
    -poscov

phyloFlash.7864.err.txt

Sidduppal avatar Jan 16 '22 19:01 Sidduppal

Hi, it looks like emirge crashed sometime on its way and did not get to generate the final iteration. you can get phyloFlash to finish the pipeline without emirge using -skip_emirge. Given the high sequencing depth of your file, emirge might get stuck. You could also try to run the analysis at e.g. 90% identity cutoff, this will make the hits more specific and can help get emirge to finish the analysis. We still need to capture this error on the emirge run, thanks for reporting this issue.

HRGV avatar Jan 18 '22 11:01 HRGV

@HRGV, I was able to complete the pipeline using skip_emirge option. Do you think the above error an emirge bug or a PhyloFlash bug? I was unable to install standalone emirge using conda. I was thinking of using the 90% identity cutoff however, I'm afraid that might get rid of a lot of novel taxon that's in the sample. I also noticed that after running with the skip_emirge flag only 102 16sRNA sequences are present in T2R1_B_O.all.final.fasta. The number of assembled 16s sequences are far less than expected. In contrast, I ran barrnap on the assembled metagenome which yielded >1000 16s rRNA sequences, our 16s rRNA sequencing on the same sample yielded much larger number of OTUs. Am I correct in thinking that the reason for this could be the poor mapping ratio (0.037%) and fraction assembled (25.806%). Any idea on why this could be happening and how to improve this?

A few questions:

  1. Do you think the number of 16s sequences might increase significantly when using emirge?
  2. What parameters will you recommend for using PhyloFlash on a high diversity environment like soil?

Thanks

Sidduppal avatar Jan 19 '22 17:01 Sidduppal

I encountered the same issue and managed to resolve this by installing usearch, which is a dependency for emirge_amplicon.py. Make sure it is accessible in your current path.

aquagenomics avatar Jun 16 '22 08:06 aquagenomics