ALLHiC
ALLHiC copied to clipboard
ALLHiC_corrector crashes
Hi, I am trying to run ALLHiC_corrector, but get the following error. DO you have any recommendations?
[13:29:17] Contig: ptg000402l Getting mapping list
[13:29:17] Contig: ptg001136l Getting mapping list
Traceback (most recent call last):
File "/data2/DUNJA/ALLHIC/ALLHiC/bin/ALLHiC_corrector", line 310, in
Found a solution, by creating an environment with python3.7 and sorting and indexing bam files:
conda create -y -n allhic python=3.7 samtools bedtools matplotlib pysam conda activate allhic samtools sort -@ 32 -o Q60.sort.bam Q60.bam samtools index Q60.sort.bam ALLHiC_corrector -m Q60.sort.bam -r TEST.fa -o TEST.fa-corrected -t 8
I realized that the bwa mapping is really slow for large genomes. Could you provide some hints how to integrate tools like minimap2 into the ALLHIC pipeline? Would you run in paired end mode? Or run each pair independently and then merge the results?
Hi @HMPNK,
I am not aware that minimap2 is suitable for Illumina reads mapping. Actually, bwa mapping is a bit slow, however, you can split the big fastq files into a number of small files using seqkit split2
command, and then run each individual PE fastq in parallel. The resulting bam files can subsequently merged using samtools merge
or sambamba merge
.
Hi, thanks for your recommendations. Regarding minimap2, it is a pretty fast short read mapper if using parameters "-a -x sr"
I am currently stucked with another issue (I already tested different samtools versions (0.1.19-44428cd, version 1.9 and version 1.12)):
ALLHiC_partition -b prunning.bam -r TEST.fa -e GATC -m 0 -k 150
Extract function: calculate an empirical distribution of Hi-C link size based on intra-contig links
CMD: allhic extract prunning.bam TEST.fa --RE GATC
16:39:12 writeRE | NOTICE RE counts in 21172 contigs (total: 13306257, avg 1 per 381 bp) written to prunning.counts_GATC.txt
16:39:12 extractContigLinks | NOTICE Parse bamfile prunning.bam
16:39:12 extractContigLinks | ERROR Cannot open bamfile prunning.bam
(sam: reference already used)
Partition contigs based on prunning bam file
CMD: allhic partition prunning.counts_GATC.txt prunning.pairs.txt 150 --minREs 0
16:39:12 ReadCSVLines | NOTICE Parse csvfile prunning.counts_GATC.txt
16:39:12 readRE | NOTICE Loaded 21172 contig RE lengths for normalization from prunning.counts_GATC.txt
16:39:12 skipContigsWithFewREs | NOTICE skipContigsWithFewREs with MinREs = 0 (RE = GATC)
16:39:12 skipContigsWithFewREs | NOTICE Marked 0 contigs (avg 0.0 RE sites, len 0) since they contain too few REs (MinREs = 0)
16:39:12 ReadCSVLines | NOTICE Parse csvfile prunning.pairs.txt
16:39:12 mustOpen | CRITIC open prunning.pairs.txt: no such file or directory
Have solved it, this was due to read having "/1" "/2" naming convention in my bam file.
Hi Dr zhang, I run the allhic_corrector and get a file.out: [32m[09:18:48][0m Contig: utg000008l Getting hic list with bin size: 25000 [32m[09:18:49][0m Contig: utg000008l Getting wide mismatch [32m[09:18:49][0m Contig: utg000008l Getting narrow score with bin size: 1000 [32m[09:18:49][0m Contig: utg000008l Getting narrow mismatch _[32m[09:18:49][0m Contig: utg000063l Getting mapping list [32m[09:18:49][0m Contig: utg000063l Getting hic list with bin size: 25000 [32m[09:18:49][0m Contig: utg000063l Getting wide mismatch [32m[09:18:49][0m Contig: utg000063l Could not found mismatch_
i want to know if the could not found mismatch will affect the subsequent analysis? Looking forward to your reply,thanks very much.