[Hi-C maps] How to manage heterotype duplications?
Hello,
I would like to ask what do you usually do when encountering this type of patterns in the scaffolds:
Is it caused by undercollapsed heterozygosity right ? So I should move to debris?
Thank you in advance Quentin.
Hi Quentin,
I'm not sure how the green rectangles in the contact maps were generated. Despite the weak Hi-C signals in this short fragment, it seems like it should be relocated to the interior of the final large green rectangle. If this large green rectangle was generated by some tools, there may be an error.
Best regards, Xiaofei
Hello, I generated them using the GreenHill utility fasta to juicebox assembly. It will generate the green blocks at each gap region. It is very useful to move contigs. But it ended making my assembly of type Group:::fragment ...
Actually I could use the script assembly to fasta from the juicebox_scrips folder and then I could get a final.fa and a final.agp and after that I did rename the chromosomes and remapped HiC reads. Everything is fine now.
However the assembly after HapHiC from my polished assembly had 700 contigs but the initial assembly 500 but also the quality was different.
I wonder, if I need to maybe continue the assembly at 700 contigs I think it would be better because I have corrected many structural errors. Meanwhile I will run HapHiC ( it's running now) using the P utgs and gfa P utgs.
I wonder if the P utgs will have a better quality than the haplotype resolved contigs in term of QV. Also how to get the opposite haplotype? Is it just a consensus assembly like wtdbg2 outputs?
Also for the haplotype resolved assembly, how to supply the .hic.hap1.p_ctg.fa and .hic.hap2.p_ctg.fa to HapHic?
Should I cat file 1 file 2 > combined ?
Or should I use the * like hap*.fa ?
Because in your wiki you explain about p ctgs but not for P utgs.
Another Issue that I found is that I want to be able to look at the same map at MAPQ0 do I have fuiltered the HIC reads and run quick view mode to generate Hic.filtered_0.bam
Then I got the out_JBAT and the hic files but when I use it as control in JBAT the scaffolds are not sorted like in the hic map at MAPQ1.
Would it be possible to add an option like --mapq "auto(1)"
But we could choose 0 or 30 or all. And it would make 3 hic maps at MAPQ1, 0 and 30 so we can visualise the telomeres or repeats at mapq0 and at mapq30 see whats unique between sequences.
Mapq0 is important because if we see a gap in the HiC map it means that there is no HiC mapping. Normally if there is no hic mapping, and we overlay another technology like HIFI.winnowmap.aligned.sorted.wig We could directly cut the useless parts in JBAT instead of calling a consensus with bcftools consensus --no-ref
These are just suggestions. And also maybe a converter for bed files because many times I had annotations of the scaffold.fa that Wes generated after HapHiC and I wanted to plot Hifi reads mapping (minimap2), gaps ( detgaps from asset), QV errors ( from Mercury) telomeres ( from seqkit locate). In the end I could do all of that but actually you could directly add of all these functions to HapHiC. Maybe tomorrow i'll send you the scripts if you're interested.
It is true that you may think that the most important is to be able to resolve the genome assembly / scaffolding problem and that you already gave to the answer. But I believe that one single pipeline could be useful especially for a gain of time. But at the same time maybe It is better to just let Hap Hic as is.
However I find the reordering of contigs by name disturbing. I am used to contig ordering by size. I wonder what could be the reason to order by name? Or is it arbitrary? Or is it from Hifiasm?
See you and have a wonderful day.
Your tool is very powerful. Haha
Quentin.
On Tue, Oct 22, 2024, 4:12 PM Xiaofei Zeng @.***> wrote:
Hi Quentin,
I'm not sure how the green rectangles in the contact maps were generated. Despite the weak Hi-C signals in this short fragment, it seems like it should be relocated to the interior of the final large green rectangle. If this large green rectangle was generated by some tools, there may be an error.
Best regards, Xiaofei
— Reply to this email directly, view it on GitHub https://github.com/zengxiaofei/HapHiC/issues/83#issuecomment-2428725430, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TAQDQXISDD2WS32OSDZ4YJIBAVCNFSM6AAAAABQLRMLKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRYG4ZDKNBTGA . You are receiving this because you authored the thread.Message ID: @.***>
Hello Quentin,
Thank you for your interest in the project and for taking the time to write such a detailed post. I sincerely apologize for the significant delay in addressing your issue.
After reviewing the content, I must point out that this single post covers too many distinct topics, questions, and feature suggestions. This "stream of consciousness" approach makes it challenging for me to effectively grasp the core problems and prioritize your requests.
In an effort to ensure more efficient and timely responses for all users, I have recently established new standards for issue submission in our updated Issue Submission Guidelines.
I will be closing this issue now to keep the issue tracker manageable and focused. If you still require answers or action on the points you raised, please consider restructuring and resubmitting them as separate issues (one topic per issue), strictly following the criteria outlined in the new Guidelines.
Thank you for your understanding and continued support of the project. Once again, I apologize for the initial lack of a timely reply.
Best regards, Xiaofei