fastp icon indicating copy to clipboard operation
fastp copied to clipboard

Insert size peak 35 bp

Open nvpatin opened this issue 5 years ago • 12 comments

I have several metagenomes that were sequenced the same way (same Illumina platform, same adapters). Some of them are showing an insert size peak of 35 bp. I have tried providing adapter sequences manually in case they were not being trimmed but this peak size remains the same. What might be causing this peak? Has anyone else experienced this issue?

nvpatin avatar Jul 08 '20 18:07 nvpatin

How long you sequenced? Have you checked the template length peaks with QC devices like Agilent 4200?

sfchen avatar Jul 08 '20 23:07 sfchen

i saw this with metagenomes sequenced with both MiSeq 2 x 250 and 2 x 300 kits. Yes, I checked the template length on a Bioanalyzer before sequencing and size-selected for the appropriate length. In all cases the peak was EXACTLY 35 bp which is why I noticed it, in addition to this size being exceptionally small.

nvpatin avatar Jul 09 '20 00:07 nvpatin

Would you please upload the HTML report?

sfchen avatar Jul 09 '20 00:07 sfchen

fastp-reports.zip

These are examples of reports showing both the 35 bp insert peak and "normal" reports with larger insert sizes as expected.

nvpatin avatar Jul 09 '20 00:07 nvpatin

From the reports. OV080516 DNA fragments are too short, and OV982816 DNA fragments are too long (so no overlap). Furthermore, both them have a lot of primer dimers.

sfchen avatar Jul 09 '20 01:07 sfchen

Can you explain where in the report you find this information? Also, why the insert peak size is 35 bp in both cases?

nvpatin avatar Jul 09 '20 01:07 nvpatin

In OV982816, 99.8% reads are with unknown template length, which means the read pairs are not overlapped, which means the template length is > 2*250. So the DNA fragments are too long. In such case (less than 0.2% reads are overlapped), the peak evaluation is not accurate.

I have to correct my last comment. The numbers of primer dimers for these two samples are not big.

sfchen avatar Jul 09 '20 01:07 sfchen

Thank you, that's helpful. However, both OV080516 and OV091816 were sequenced with 600 cycles (2 X 300 bp PE) so the template length should be closer to 600. In fact, the mean length after filtering is ~255-275, which makes sense. But I still don't understand why the insert peak size is so small.

nvpatin avatar Jul 09 '20 01:07 nvpatin

Sorry closer to 300 not 600.

nvpatin avatar Jul 09 '20 01:07 nvpatin

Insert size can be much much higher than read length. Actually you can evaluate the insert size again using the alignment file, which can be more accurate.

sfchen avatar Jul 09 '20 01:07 sfchen

I hope it's not too disruptive adding to this old thread, but we have seen in our lab a still not fully explained 19bp insert size mode (when the median/mean are ~300 or as expected)- almost exclusively in human saliva preps. It appears to be microbial signature that maps to herpesvirus4, and is a short RNA with affinity for DNA binding. Have you attempted isolating some of these reads and seeing if they match to other organisms (or are piling up at a specific site in your critter, perhaps some region duplicated but compressed in the reference? or expressed significantly?

iamh2o avatar Nov 16 '21 00:11 iamh2o

I have a similar data. I also have 0.1% of reads with insert size < 35bp%. but If the insert size is so small the reads should be trimmed to be as small and be discarded because I set min length of 75bp. Is the graph produced before the filtering or is there a problem in the filtering?

SilasK avatar Jan 22 '24 07:01 SilasK