chromap icon indicating copy to clipboard operation
chromap copied to clipboard

Very low number of mapped reads

Open rajitz opened this issue 3 years ago • 3 comments

Hi,

I'm seeing a very low number of mapped reads as can be seen with:

Number of reads: 71940110. Number of mapped reads: 269214

I'm using the default parameters when running: /chromap-0.1_x64-linux/chromap -r SpikeIns_Synthetic_New71_AND_hg38.fa -1 RD0175-IC_S36.pe_1.trimmed.fastq.gz -2 RD0175-IC_S36.pe_2.trimmed.fastq.gz -x SpikeIns_Synthetic_New71_AND_hg38_chromap_index --SAM -o RD0175-IC_S36.sam 1>&2

From the log file:

Number of threads: 1 Analyze bulk data. Won't try to remove adapters on 3'. Won't remove PCR duplicates after mapping. Will remove PCR duplicates at bulk level. Won't allocate multi-mappings after mapping. Only output unique mappings after mapping. Only output mappings of which barcodes are in whitelist.

Could this issue be because of "Will remove PCR duplicates at bulk level" ? I haven't set the parameter --remove-pcr-duplicates-at-bulk-level though or any of the --remove-pcr-duplicates parameters.

Also, I haven't specified any barcode (whitelist) file and haven't set the parameter --output-mappings-not-in-whitelist.

Please provide pointers on what could be the issue. Happy to provide any other info as needed. Thanks very much.

rajitz avatar Aug 06 '21 19:08 rajitz

I guess that might just be a wrong log output. Can you share the whole log file with us? And can you provide more details about your data? What is the reference and how long are the reads? Thanks!

haowenz avatar Aug 08 '21 02:08 haowenz

Please see attached the log file. job.err.log

The SAM file was only 27 MB so the log output looks fine. The reference is a mix of hg38 and around 70 synthetic sequences. In the paired-end FASTQ files the reads are around 100 bp each and come almost exclusively from the synthetic reads.

Also, can chromap set parameters like alignment mode, min and max insert size? If not, how are these set/computed? Also, can it output unaligned reads into paired-end FASTQ files? Thank you.

rajitz avatar Aug 08 '21 19:08 rajitz

Sorry for the late reply. From the log file, it is hard to tell the reason. It might be easier if you can share a sample of your reads and the reference genome to us (either email us or post a link here). I can run and figure out the issue here. Are the 70 synthetic sequences similar to human genomes? And what is your fragment length or insert size? You can set the max insert size to 2000 by "-l 2000". The min fragment length that can get mapped is around 30. You can increase this value to 50 by "--min-read-length 50".

haowenz avatar Aug 15 '21 04:08 haowenz