lotus2 icon indicating copy to clipboard operation
lotus2 copied to clipboard

Trying to process ONT partial LSU amplicons

Open SebaZambrano opened this issue 11 months ago • 4 comments

I'm trying to cluster with VSEARCH and assign taxonomy to a set of samples using partial LSU amplicons (in fasta format) PREVIOUSLY extracted from ONT long reads using ITSx. I used the following parameters: lotus2 -m LSUmap.txt -i LSU_lotus/ -o lotuS_LSU_out/ -CL VSEARCH -amplicon_type LSU -ITSx 0 -s sdm_ONT_LSSU.txt -refDB UNITE -taxAligner blast -tax_group fungi

The pipeline stopped due to a dereplication error (it failed to identify unique sequences). The error was reported as: The sdm dereplicated output file was either empty or not existing, aborting lotus. lotuS_LSU_out//tmpFiles//derep.fas

Note that I modified the sdm file, in particular the max and min length parameters, nothing else. I'm attaching the sdm file. sdm_ONT_LSSU.txt

Any help would be very much appreciated!

SebaZambrano avatar Mar 27 '24 16:03 SebaZambrano

Hey, could you check how many reads passed the initial sdm filter? This should be shown on the console, or in the Log dir (otuS_LSU_out/LotuSLogs/) there should be sdm named files. My first guess would be that no read was dereplicated, because no read passed the quality controls (or too few reads). If this is the case: you need to further lower the qual filter, and the log file would guide which qual filter caused most reads being removed. Further, you can try to lower the dereplication parameters, by setting "-derepMin 0" or similar. However, note that LotuS2 was never programmed to work with ONT reads, ie many assumptions of the read clustering will be broken by the (usually) really low quality of ONT reads. hth, Falk

hildebra avatar Mar 27 '24 17:03 hildebra

Hi,

It was indeed the dereplication parameters (I did lower the qual filters beforehand since I'm working with fastas). I set the "-derepMin" to 0 and it did run, but it gave an abnormally high number of OTUs (aprox. 50% of the pass reads number). I couldn't find the default value of this setting and what it means, how could you explain it? Thanks for helping.

SebaZambrano avatar Mar 27 '24 20:03 SebaZambrano

Hey, basically 0 means to accept every read. "2" means only to accept reads that occur two times at 100% identity. I see that in the website (https://lotus2.earlham.ac.uk/) the link is not correctly set to the examples, we will fix this later, apologies. @4less best, Falk

hildebra avatar Mar 31 '24 07:03 hildebra

https://lotus2.earlham.ac.uk/lotus/Derep_options.pdf

here is the pdf explaining the derep parameter. Best, Joachim

4less avatar Apr 02 '24 12:04 4less