yahs icon indicating copy to clipboard operation
yahs copied to clipboard

Difference between v1.2 and v1.2a.1?

Open xiekunwhy opened this issue 1 year ago • 10 comments

Hi,

What are the differece between v1.2 (git clone from github source and compiled) and v1.2a.1( install using conda)?

I got differece results using these two version, it seem that v1.2 (git clone from github source and compiled) worse than v1.2a.1(install using conda), because v1.2 tend to generate wrong connection(scaffold_1).

Here is v1.2 (git clone from github source and compiled) result, image

Here is v1.2a.1 (install using conda) result, this result is more close to our expected (I can not tell you what species is). image

All parameters are the same when running these two versions.

Best, Kun

xiekunwhy avatar Sep 15 '23 08:09 xiekunwhy

Hi, Kun Sorry, my question has nothing to do with the topic of this issue. Can I know what tool you use to visualize.HIC files? It looks better than the images exported from juiceBox.

yangqimeng99 avatar Sep 20 '23 09:09 yangqimeng99

Hi @yangqimeng99 ,

I was modified a script from endhic (https://github.com/fanagislab/EndHiC), matrix2heatmap.py, it accept HiC-Pro bed and matrix files, not HIC files. If you still have the corresponding bam file, you can create bed and matrix file using tools in HiC-Pro pipeline, and then plot the results. This page may help you to convert bam to HiC-Pro files(start from 3.6) https://blog.sciencenet.cn/home.php?mod=space&uid=2970729&do=blog&id=1185463

Best, Kun

xiekunwhy avatar Sep 22 '23 09:09 xiekunwhy

Thank you very much for your sharing and suggestions! @xiekunwhy

yangqimeng99 avatar Sep 22 '23 13:09 yangqimeng99

Hello Kun,

In 1.2, we are trying to fix the telo-to-telo misjoin problem. We saw this problem for some plant genomes. This is, however, still under development, so there is no release yet. In your case, the fix seems not very successful...

There are also some extra changes in 1.2, such as better AGP format compatibility and pair format input.

Best, Chenxi

c-zhou avatar Sep 22 '23 15:09 c-zhou

Hi Chenxi,

Thank you for your reply, and I know the differences now. I need to tell you an other problem.

Yahs tend to misjoin and create more butterfly connections than endhic when anchoring high quality contigs (same contigs, same bam file used). contig Nx: Total: 724212520 Count: 43 Average: 16842151.63 Median: 1939243 N00: 79080472 N10: 79080472 N20: 58646112 N30: 52944404 N40: 51019557 N50: 40339182 N60: 35403800 N70: 34284038 N80: 28762492 N90: 17838659 N100: 124441

The yahs results (yahs1.2a.1 --no-mem-check -o sbi.nd.yahs -q 0 sbi.polish.fa sbi.dedup.bam), scaffold 1 is a mis-join scaffold, most of other scaffold are butterfly connected scaffold. image

The endhic results, things seem all good, image

Best, Kun

xiekunwhy avatar Sep 22 '23 17:09 xiekunwhy

Hi Kun,

Thanks for showing the example. You are right, we indeed saw this for scaffolding near-complete genome assemblies and is exactly the problem we want to solve in version 1.2.

By the way, the -q 0 is quite aggressive, meaning to use all multi-mapping reads, which tends to introduce more assemblies errors, especially in the repetitive regions. I am not sure though if dropping it will solve the problem in the first scaffold. We did see misjoins with the default settings, which is -q 10.

Best, Chenxi

c-zhou avatar Sep 26 '23 11:09 c-zhou

Hi Chenxi,

I use HiC-Pro pipeline to mapping the reads, low quality and multi-mapping has been removed when combining read1 and read2 results. And I got exactly the same results using -q 10.

Best, Kun

xiekunwhy avatar Sep 26 '23 13:09 xiekunwhy

Hi Chenxi,

I had the same problem with two misjoins between the 1st and 2nd scaffold, and the 3rd and 4th scaffold (expected chromosome number is 9) using v1.2 for a plant genome. Is there a fix for this yet?

yahs assembly.fasta trimmed_PAL_046_3_NGS23-B040_BHHC3MDSX7_S441_L002_combined_dedup_HiC.bam -o yahs_rerun

p_elata_out_JBAT_rerun_hic

Thanks, Surabhi

surabhiranavat avatar Mar 05 '24 15:03 surabhiranavat

@surabhiranavat ,if contig n90 is large enough, try other softwares, like endhic or haphic.

xiekunwhy avatar Mar 06 '24 00:03 xiekunwhy

@xiekunwhy Thank you for the suggestion!

surabhiranavat avatar Mar 06 '24 08:03 surabhiranavat