pairtools:Empty of fully duplicated library, can't estimate complexity
Thanks for conducting this useful tool for 3D genome analysis. However, when I tried to convert the bam file (haplotagged by whatshap) to the pairs format, I met the error
pairtools:Empty of fully duplicated library, can't estimate complexity
The code I run was as follows
pairtools parse2 \
--output-stats scNM-C_001.stats.txt \
-c $fai --drop-sam --drop-seq --expand --add-pair-index --min-mapq 20\
scNM-C_001.ht.bam -o scNM-C_001.ht.pairs.gz
Could you help me fix this problem? thanks a lot.
Do you get any pairs in the output? If yes, this should be safe to ignore at this stage. This is a warning from estimation of library complexity which requires annotation of duplicated pairs, but at the parsing stage before dedup this information is not available.
Thanks for your extremely quick reply! However, I have checked the output and there is no proper output in the .pairs file. And I also try the unhaplotagged .sam file directly generated from bwa, but it seems the same. The output .pairs file write
d3d59d85-f117-406d-93e3-4901250df094 ! 0 ! 0 - - XX 1 R1-2
f243eba2-22fe-4b38-a415-d8985d077396 ! 0 ! 0 - - XX 1 R1-2
7d11cd1f-e60c-448a-abf4-491d4e2fbcb3 ! 0 ! 0 - - XX 1 R1-2
ff74d552-2350-4449-a06b-11c06e5de5de ! 0 ! 0 - - XX 1 R1-2
e7669fad-39b2-4ceb-b1ce-57d2f3b9feff ! 0 ! 0 - - XX 1 R1-2
cc6863d0-ad92-40a4-a82d-c10c4cd99c0b ! 0 ! 0 - - XX 1 R1-2
b400bcfc-0f06-463d-8948-35f897c7fdfb ! 0 ! 0 - - XX 1 R1-2
fe0b58e9-4562-4735-9a76-931bc108771b ! 0 ! 0 - - XX 1 R1-2
7013ec1b-fe2a-4c5f-b58b-8eb9af7e96e5 ! 0 ! 0 - - XX 1 R1-2
and the .stat file write
total 1406727
total_unmapped 1406727
total_single_sided_mapped 0
total_mapped 0
total_dups 0
total_nodups 0
cis 0
trans 0
pair_types/XX 1406727
Previously, I have tried to due with the same .sam file with hickit::sam2seg and it has generated informative and proper results. So what's the problem. And I sincerely appreciate you reply again, thank you.
@agalitsyna is this something you fixed recently?
Hi @Wong718 , What version of pairtools do you use? Is the problem reproducible with the latest version from github? Is it single-end read library or paired-end? Also, feel free to share the sample of this bam file.
Hi @agalitsyna , It seems like I get the same error (Empty of fully duplicated library, can't estimate complexity), with total_mapped reads = 0, even though the .sam file generated after the alignment is quite big (500GB). I am using pairtools 1.1.2, and I have a single-end read library. I am wondering if the single-end reads are the problem.
@marianthimar are you using the --single-end argument? https://pairtools.readthedocs.io/en/latest/cli_tools.html#cmdoption-pairtools-parse2-single-end
@Phlya no, I will check it out and let you know, thank you! I am following this MicroC pipline https://micro-c.readthedocs.io/en/latest/fastq_to_bam.html
Hi @marianthimar , you may want to try pairtools parse2 instead of parse. There's no support for single-end reads in regular parse as it is designed and maintained for vanilla paired-end Hi-C.
Read the manual on it here, it should be fairly easy to adjust to the workflow that you follow.