fastp
fastp copied to clipboard
Insert size peak 35 bp
I have several metagenomes that were sequenced the same way (same Illumina platform, same adapters). Some of them are showing an insert size peak of 35 bp. I have tried providing adapter sequences manually in case they were not being trimmed but this peak size remains the same. What might be causing this peak? Has anyone else experienced this issue?
How long you sequenced? Have you checked the template length peaks with QC devices like Agilent 4200?
i saw this with metagenomes sequenced with both MiSeq 2 x 250 and 2 x 300 kits. Yes, I checked the template length on a Bioanalyzer before sequencing and size-selected for the appropriate length. In all cases the peak was EXACTLY 35 bp which is why I noticed it, in addition to this size being exceptionally small.
Would you please upload the HTML report?
These are examples of reports showing both the 35 bp insert peak and "normal" reports with larger insert sizes as expected.
From the reports. OV080516 DNA fragments are too short, and OV982816 DNA fragments are too long (so no overlap). Furthermore, both them have a lot of primer dimers.
Can you explain where in the report you find this information? Also, why the insert peak size is 35 bp in both cases?
In OV982816, 99.8% reads are with unknown template length, which means the read pairs are not overlapped, which means the template length is > 2*250. So the DNA fragments are too long. In such case (less than 0.2% reads are overlapped), the peak evaluation is not accurate.
I have to correct my last comment. The numbers of primer dimers for these two samples are not big.
Thank you, that's helpful. However, both OV080516 and OV091816 were sequenced with 600 cycles (2 X 300 bp PE) so the template length should be closer to 600. In fact, the mean length after filtering is ~255-275, which makes sense. But I still don't understand why the insert peak size is so small.
Sorry closer to 300 not 600.
Insert size can be much much higher than read length. Actually you can evaluate the insert size again using the alignment file, which can be more accurate.
I hope it's not too disruptive adding to this old thread, but we have seen in our lab a still not fully explained 19bp insert size mode (when the median/mean are ~300 or as expected)- almost exclusively in human saliva preps. It appears to be microbial signature that maps to herpesvirus4, and is a short RNA with affinity for DNA binding. Have you attempted isolating some of these reads and seeing if they match to other organisms (or are piling up at a specific site in your critter, perhaps some region duplicated but compressed in the reference? or expressed significantly?
I have a similar data. I also have 0.1% of reads with insert size < 35bp%. but If the insert size is so small the reads should be trimmed to be as small and be discarded because I set min length of 75bp. Is the graph produced before the filtering or is there a problem in the filtering?