ALLHiC icon indicating copy to clipboard operation
ALLHiC copied to clipboard

How Allhic works for simple genomes?

Open pjx1990 opened this issue 3 years ago • 13 comments

Can I use allhic to anchor the simple genome, such as rice? Thanks.

pjx1990 avatar Mar 27 '21 14:03 pjx1990

Sure! ALLHiC is definitely applicable to simple genomes. I've uploaded a file, namely ALLHiC_pip.sh (https://github.com/tangerzhang/ALLHiC/blob/master/bin/ALLHiC_pip.sh), which wraps a couple of functions including reads mapping, correction, partition, optimize, build and plot. This script is designed for Hi-C scaffolding of simple genomes. To run ALLHiC_corrector, the numpy and scipy package will be required. Please let me know if there is any question regarding this script.

tangerzhang avatar Mar 28 '21 11:03 tangerzhang

Thank you very much! Can ALLHiC anchor contigs to chromosome scale instead of scaffolds? In addition, do I need to manually correct the final result in other tools, such as Juice_box? In fact, I've run it with the pipeline you provided before, but the result of heatmap is not very good. I'll run it again with your new pipeline.

pjx1990 avatar Mar 28 '21 12:03 pjx1990

Hi @pjx1990 ALLHiC can anchor contigs onto chromosomes if the number of chromosomes is given. The new pipeline I just uploaded includes correction of contigs and therefore may have a better performance than before. However, if the heatmap is not good enough, you may also need juice_box to adjust the results.

tangerzhang avatar Mar 28 '21 12:03 tangerzhang

Thanks. I've run the new pipeline(ALLHiC_pip.sh) once. But the error occured at line 104, it shows "line 104: ParaFly: command not found". I looked at the source code, but I didn't find this script. How can I solve it? In addition, this script relies on the pysam package, please attach this note.

pjx1990 avatar Mar 29 '21 01:03 pjx1990

Thanks for mentioning us. The ParaFly comes from trinity package (https://github.com/trinityrnaseq/trinityrnaseq) and it is used for parallel running of dozens of command lines. I will add an update README shortly.

tangerzhang avatar Mar 29 '21 01:03 tangerzhang

Hi, I've run it successed, and the result much better than before, but still has some errors. Now I want to adjust it with Juice_box, but I don't know how to generate the appropriate file format(like .hic and .assembly) into juice_box, and how to get the final fasta file. By the way, the matplotlib package should be updated, because ALLHiC_plot showed a warning: MatplotlibDeprecationWarning: savefig() got unexpected keyword argument "filetype" which is no longer supported as of 3.3 and will become an error two minor releases later

pjx1990 avatar Mar 29 '21 03:03 pjx1990

Hi @pjx1990 We had a discuss how to generate the .hic and .assembly files. Please see this thread (https://github.com/tangerzhang/ALLHiC/issues/68). And thanks for noticing us the matplot error.

tangerzhang avatar Mar 29 '21 12:03 tangerzhang

Hi Dr. Zhang,

I think there is an error in the ALLHiC_pip.sh script:

### filter bam samtools view -bq $threads sample.bwa_mem.bam  |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam

the $threads may not be a parameter after -q, which means mapping quality?

xinghua1001 avatar Apr 01 '21 15:04 xinghua1001

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account.

xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道:

Hi Dr. Zhang,

I think there is an error in the ALLHiC_pip.sh script:

filter bam

samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam

the $threads may not be a parameter after -q, which means mapping quality?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-811987461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

tangerzhang avatar Apr 01 '21 15:04 tangerzhang

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道: Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. ("XX.bwt2pairs_interaction.bam" or "XX.bwt2pairs.bam" or other file ?) Look forward to your reply. Thanks.

StevenBai97 avatar Apr 02 '21 10:04 StevenBai97

Hi, I will prefer to use 40 as the quality cutoff, i.e., samtools view -bq 40 Actually, ALLHiC_pip.sh takes fastq files as input as this script will perform two rounds of reads mapping. For the first round, misjoined contigs will be corrected based on the initial reads mapping. And for the second round, these corrected contigs will be linked based on Hi-C signals.

StevenBai97 @.***> 于2021年4月2日周五 下午6:20写道:

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道: … <#m_-1061902387812951399_> Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment) https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-811987461>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. (".bwt2pairs_interaction.bam" or ".bwt2pairs.bam" or other file) ? Look forward to your reply. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-812470018, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKRWLP7A5CAALNL6BF3TGWK53ANCNFSM4Z46FMRA .

tangerzhang avatar Apr 02 '21 14:04 tangerzhang

Hi, I will prefer to use 40 as the quality cutoff, i.e., samtools view -bq 40 Actually, ALLHiC_pip.sh takes fastq files as input as this script will perform two rounds of reads mapping. For the first round, misjoined contigs will be corrected based on the initial reads mapping. And for the second round, these corrected contigs will be linked based on Hi-C signals. StevenBai97 @.> 于2021年4月2日周五 下午6:20写道: Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.> 于2021年4月1日周四 下午11:30写道: … <#m_-1061902387812951399_> Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment) <#87 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA . Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. (".bwt2pairs_interaction.bam" or ".bwt2pairs.bam" or other file) ? Look forward to your reply. Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKRWLP7A5CAALNL6BF3TGWK53ANCNFSM4Z46FMRA .

Thanks for your reply. I will have a try according to your suggestions.

StevenBai97 avatar Apr 03 '21 05:04 StevenBai97

Hi Dr. Zhang, Why does the script ALLHiC_pip.sh (https://github.com/tangerzhang/ALLHiC/blob/master/bin/ALLHiC_pip.sh) not have the process of ALLHiC_Rescue and extract, such as the method in this link? ( https://github.com/tangerzhang/ALLHiC/wiki ) Thanks.

WeiSong-bio avatar Jul 21 '21 17:07 WeiSong-bio