hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

request for comment: chrY

Open ptrebert opened this issue 3 years ago • 29 comments

Hi, we have used two different hifiasm versions for assembling male genomes. For chrY, we observed that the more recent version (v0.15.2) distributes the chrY sequences between primary and alternate output (i.e. hifiasm was executed with the --primary parameter), whereas the older version (v0.14.2) produced a more complete chrY assembly as the primary output.

The IGV screenshot illustrates the point:

  • the first two rows per sample are primary and alternate contig output from v0.15.2
  • third row per sample is the primary contig output from v0.14.2

image

Can you help us understand what's happening here? Any comment would be much appreciated, thanks a lot!

Best, Peter

ptrebert avatar Jun 09 '21 19:06 ptrebert

What's the size of the new assemblies and old assemblies? Could you please also have a try with current HEAD (0.15.3-r339)? Generally I think the p_ctg of new assemblies should be larger and keep more segdups than the old ones.

chhylp123 avatar Jun 10 '21 00:06 chhylp123

Thanks for your swift reply. I checked the assembly size (p_ctg) and the v0.14.2 ones are roughly ~30 to ~50 Mbp shorter than the ones produced with v0.15.2 (i.e., where we observed the less complete chrY in the p_ctg output).

I will do one assembly with the current HEAD as suggested and report back when this is done.

ptrebert avatar Jun 10 '21 08:06 ptrebert

The first test run unfortunately ended in a segfault - since I used a hifiasm version that was checked out a couple of hours before you made the v0.15.3 release, that was probably just bad luck... I restarted the whole thing

ptrebert avatar Jun 12 '21 04:06 ptrebert

Could you please show the log file of the segfault run?

chhylp123 avatar Jun 12 '21 11:06 chhylp123

Sure - please see the attached file

hifiasm_chrY_na19239.t12.log

edit: I am now rerunning with the release version 0.15.3

ptrebert avatar Jun 13 '21 12:06 ptrebert

The test run using the v0.15.3 release version finished, we are evaluating now the chrY completeness

ptrebert avatar Jun 15 '21 12:06 ptrebert

We have the results back, and there does not seem to be any striking difference for chrY between v0.15.2 and v0.15.3 (see snapshot). Again, looks like large parts of chrY are only part of the alternate output (bottom track, sample is NA19239):

image

ptrebert avatar Jun 15 '21 16:06 ptrebert

I see. Let me check about NA19239 and HG002 on our side. Please wait me a moment.

chhylp123 avatar Jun 15 '21 17:06 chhylp123

Could you please show the fastq file for this run? Probably it is not too large? I just want to reproduce this issue on my side. Thanks a lot.

Sure - please see the attached file

hifiasm_chrY_na19239.t12.log

edit: I am now rerunning with the release version 0.15.3

chhylp123 avatar Jun 16 '21 04:06 chhylp123

You mean the run that segfaulted? If you insist, of course, but since I could successfully assemble the data with the released v0.15.3, I am pretty sure this was rather a local problem (compute node failing or similar). The FASTQ input data for NA19239 is ~65G, so it's difficult to easily share that unless we could use something like Globus?

edit: that's also the same data that were used for the successfull run where we still observe the incomplete chrY with v0.15.3

ptrebert avatar Jun 16 '21 07:06 ptrebert

I see. Let me focus on chrY first.

chhylp123 avatar Jun 16 '21 11:06 chhylp123

@ptrebert Could you please let me know where can I find the HiFi data of any of these 3 samples? I recently tested HG002 and the results look reasonable. image

chhylp123 avatar Jun 24 '21 03:06 chhylp123

chrY is hard. I think old version collapsed some repeat regions on chrY so that it looks like to be more complete. However, I have no idea if it is right.

chhylp123 avatar Jun 24 '21 03:06 chhylp123

Could you please let me know where can I find the HiFi data of any of these 3 samples?

It would be easiest to share this via Globus as our university runs a Globus server and I can share folders with external collaborators - would that work for you (need your Globus email)?

chrY is hard.

No argument here :-)

We really appreciate that you are looking into this issue, thanks a lot!

ptrebert avatar Jun 24 '21 09:06 ptrebert

Yean, I guess [email protected] should work. Thanks a lot!

chhylp123 avatar Jun 24 '21 11:06 chhylp123

Great, you should have received an email invite - I just added a single data set (NA19239) for now, but let me know if you also want to check the others

ptrebert avatar Jun 24 '21 12:06 ptrebert

A simple question: do you use a fix data set for further development and evaluation of hifiasm, e.g. HG002? Or do you always run a random subset from HPRC and check if things improved?

ptrebert avatar Jul 05 '21 08:07 ptrebert

We recently focus on various human and non-human samples. We hope hifiasm can work on different samples instead of only HG002/HG00733.

chhylp123 avatar Jul 05 '21 08:07 chhylp123

Thanks for the info

ptrebert avatar Jul 05 '21 08:07 ptrebert

By the way, I have tried to download NA19239 from Globus several times and the speed was slow. I just checked the status again and found for now it reported "Permission Denied. No effective ACL rules on the endpoint..." Could you please give me the permission again? I'm so sorry for the delay.... : (

chhylp123 avatar Jul 05 '21 08:07 chhylp123

No problem - the ACL problem was probably just a temporary issue, at least no other external collaborator reported problems with our Globus endpoint. I re-added your email and you should have received another invitation, let me know if it's working now.

Regarding the speed issue: well, that is out of my hands, unfortunately. The internet connection of our university is indeed not optimal, and easily at its limits if there is a lot of traffic.. if it's really not working, then I would have to look for alternatives; just let me know...

ptrebert avatar Jul 05 '21 08:07 ptrebert

Thanks a lot. Maybe something wrong on my side... Could you please add this email: [email protected]?

chhylp123 avatar Jul 05 '21 08:07 chhylp123

done - to be 100% sure, you could also end your Globus user id

ptrebert avatar Jul 05 '21 08:07 ptrebert

Probably something is wrong on my setting. Is this possible that you can directly add my id ([email protected])? I'm so sorry...

chhylp123 avatar Jul 05 '21 08:07 chhylp123

done - if you still get permission errors now, I am going to contact our IT to see if they are aware of any problems...

ptrebert avatar Jul 05 '21 09:07 ptrebert

Thank you so much! It's OK for now :) I will download it again and let you know the results as soon as possible.

chhylp123 avatar Jul 05 '21 09:07 chhylp123

Great, much appreciated!

ptrebert avatar Jul 05 '21 09:07 ptrebert

Sorry for the delay since globus was too slow on my side. For NA19239, I found the main difference between v0.14.2 and v0.15.4 is caused by one contig in red circle (atg000260l of v0.15.4/ptg000069l of v0.14.2). V0.14.2 keeps it in primary assembly while v0.15.4 puts it into alternate assembly. It corresponds to utg00563l in p_utg.gfa. Please note that utg000182l comes from chrX. After I checked the contig alignment, I think the ends of utg00563l as well as utg000182l, and the whole unitig of utg030474l, come from PAR1. In this case, hifiasm thinks utg00563l and utg000182l are homologous to each others, so that only utg000182l is kept in the primary assembly. I'm not sure how to deal with PARs. Do you have any suggestions?

Alignments: image p_utg.gfa: image

chhylp123 avatar Jul 14 '21 21:07 chhylp123

Interesting observation, and I think I now understand better where the problem is coming from. I am going to discuss this with our chrY expert, maybe she can point out a way how to handle this. Ad hoc, it seems somewhat impossible...

P.S.: since I have a couple of days off, there will be some delay until I can get back to you on the matter - sorry!

ptrebert avatar Jul 16 '21 09:07 ptrebert

closing / completed

ptrebert avatar Feb 23 '24 09:02 ptrebert