pairtools icon indicating copy to clipboard operation
pairtools copied to clipboard

pairtools:Empty of fully duplicated library, can't estimate complexity

Open Wong718 opened this issue 1 year ago • 8 comments

Thanks for conducting this useful tool for 3D genome analysis. However, when I tried to convert the bam file (haplotagged by whatshap) to the pairs format, I met the error

pairtools:Empty of fully duplicated library, can't estimate complexity

The code I run was as follows

pairtools parse2  \
	--output-stats scNM-C_001.stats.txt \
	-c $fai --drop-sam --drop-seq --expand --add-pair-index --min-mapq 20\
	scNM-C_001.ht.bam -o scNM-C_001.ht.pairs.gz

Could you help me fix this problem? thanks a lot.

Wong718 avatar Nov 04 '24 15:11 Wong718

Do you get any pairs in the output? If yes, this should be safe to ignore at this stage. This is a warning from estimation of library complexity which requires annotation of duplicated pairs, but at the parsing stage before dedup this information is not available.

Phlya avatar Nov 04 '24 15:11 Phlya

Thanks for your extremely quick reply! However, I have checked the output and there is no proper output in the .pairs file. And I also try the unhaplotagged .sam file directly generated from bwa, but it seems the same. The output .pairs file write

 d3d59d85-f117-406d-93e3-4901250df094    !       0       !       0       -       -       XX      1       R1-2
f243eba2-22fe-4b38-a415-d8985d077396    !       0       !       0       -       -       XX      1       R1-2
7d11cd1f-e60c-448a-abf4-491d4e2fbcb3    !       0       !       0       -       -       XX      1       R1-2
ff74d552-2350-4449-a06b-11c06e5de5de    !       0       !       0       -       -       XX      1       R1-2
e7669fad-39b2-4ceb-b1ce-57d2f3b9feff    !       0       !       0       -       -       XX      1       R1-2
cc6863d0-ad92-40a4-a82d-c10c4cd99c0b    !       0       !       0       -       -       XX      1       R1-2
b400bcfc-0f06-463d-8948-35f897c7fdfb    !       0       !       0       -       -       XX      1       R1-2
fe0b58e9-4562-4735-9a76-931bc108771b    !       0       !       0       -       -       XX      1       R1-2
7013ec1b-fe2a-4c5f-b58b-8eb9af7e96e5    !       0       !       0       -       -       XX      1       R1-2

and the .stat file write

total   1406727
total_unmapped  1406727
total_single_sided_mapped       0
total_mapped    0
total_dups      0
total_nodups    0
cis     0
trans   0
pair_types/XX   1406727

Previously, I have tried to due with the same .sam file with hickit::sam2seg and it has generated informative and proper results. So what's the problem. And I sincerely appreciate you reply again, thank you.

Wong718 avatar Nov 05 '24 00:11 Wong718

@agalitsyna is this something you fixed recently?

Phlya avatar Nov 05 '24 06:11 Phlya

Hi @Wong718 , What version of pairtools do you use? Is the problem reproducible with the latest version from github? Is it single-end read library or paired-end? Also, feel free to share the sample of this bam file.

agalitsyna avatar Nov 12 '24 18:11 agalitsyna

Hi @agalitsyna , It seems like I get the same error (Empty of fully duplicated library, can't estimate complexity), with total_mapped reads = 0, even though the .sam file generated after the alignment is quite big (500GB). I am using pairtools 1.1.2, and I have a single-end read library. I am wondering if the single-end reads are the problem.

marianthimar avatar Feb 07 '25 10:02 marianthimar

@marianthimar are you using the --single-end argument? https://pairtools.readthedocs.io/en/latest/cli_tools.html#cmdoption-pairtools-parse2-single-end

Phlya avatar Feb 07 '25 10:02 Phlya

@Phlya no, I will check it out and let you know, thank you! I am following this MicroC pipline https://micro-c.readthedocs.io/en/latest/fastq_to_bam.html

marianthimar avatar Feb 07 '25 10:02 marianthimar

Hi @marianthimar , you may want to try pairtools parse2 instead of parse. There's no support for single-end reads in regular parse as it is designed and maintained for vanilla paired-end Hi-C. Read the manual on it here, it should be fairly easy to adjust to the workflow that you follow.

agalitsyna avatar Feb 07 '25 12:02 agalitsyna