hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Q: trio-binning with parental Hifi data

Open ptrebert opened this issue 2 years ago • 18 comments

Hi, I have a question regarding using parental HiFi instead of Illumina data for a trio-phased assembly (all experiments done with recent versions of hifiasm and yak). When doing this, we observed fairly low (3-5 Mbp) contig N50 for the child/trio-phased assembly. Assembling trio members just as primary/alt resulted in expected contig N50 between 30 to 50 Mbp. I have seen the FAQ entries about differences in contiguity between primary and phased, and about tweaking the parameters -D and -N to potentially improve contiguity. But before we explore that option, I just wanted to get your input if using parental HiFi may cause problems that we have overlooked. Thanks for your help.

+Peter

ptrebert avatar May 19 '22 08:05 ptrebert

We haven't tried to run hifiasm with parental HiFi, so I have no idea about that. -D and -N would not help for the trio-binning assemblies. Is it possible that you can share the bin files with us?

chhylp123 avatar May 20 '22 02:05 chhylp123

thanks a lot for the fast reply. seems I misunderstood the FAQ entry about -D and -N, then.

Re data sharing: unfortunately, no, I do not have the permission for that. But we are trying the same approach with a public dataset, let's see what we find there...

ptrebert avatar May 20 '22 15:05 ptrebert

Sorry that took a bit of time, but we have repeated the experiment with the public PUR trio (HG00733 + parents); what do you think of these N50s:

                       N50 (Mb)
child-trio             12.9 hap1 | 17.8 hap2
child-noTrio           68.3
mother                 57.7
father                 59.2 

I think HPRC reports something like ~40 Mbp hap contig N50 for (Illumina) trio-binned assemblies

ptrebert avatar Jun 02 '22 07:06 ptrebert

It is not such good. Could you please share the bin files with us?

chhylp123 avatar Jun 02 '22 13:06 chhylp123

@hugocarmaga can you please make all the bin files available via Globus (folder: see internal slack) that are part of the above experiment? Thanks

And please report here when all files are copied...

ptrebert avatar Jun 02 '22 18:06 ptrebert

All the relevant files are copied there for the four assemblies mentioned above.

hugocarmaga avatar Jun 02 '22 19:06 hugocarmaga

I did trio binning assembly for the WashU trio. It worked fine. I wonder what is the issue with HG00733...

lh3 avatar Jun 03 '22 18:06 lh3

are the data for this trio (which one is that exactly?) public? we could try repeating the experiment to check our setup

ptrebert avatar Jun 28 '22 12:06 ptrebert

https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/9f0e43e9-a57d-42c1-992c-a2ce7c20940f--WUSTL_BLOOD_HIFI/

This is from a pedigree. I forgot how samples are related. You may ask Karen.

lh3 avatar Jun 28 '22 13:06 lh3

Thanks

ptrebert avatar Jun 29 '22 12:06 ptrebert

Hi @lh3,

Sorry for cutting in, the topic is interesting.

I did trio binning assembly for the WashU trio.

The same k-mer and bloom filter size (-k 31 -b 37 ) were applied? I assume this would be the case, but if not, may I ask parameters applied for in the case (i.e., trio-HiFi case)?

yfukasawa avatar Aug 28 '22 14:08 yfukasawa

Yes, same setting for short reads.

lh3 avatar Aug 28 '22 21:08 lh3

Thanks

yfukasawa avatar Aug 29 '22 05:08 yfukasawa

to keep this alive: we also did the trio-binning using the WashU trio, and the results look ok (@hugocarmaga can you add the N50s here, please?). At least for our initial dataset, the preliminary conclusion is that this is probably a problem with the data. Of course, would be nice to know if there are any insights about the sub-optimal results for the PUR trio.

ptrebert avatar Sep 15 '22 16:09 ptrebert

Thanks for letting us know. We could go back to these samples and try to figure out the problems in October. This month is quite busy. I will let you know once I get new results.

chhylp123 avatar Sep 16 '22 16:09 chhylp123

Here are the N50's for the washU trio (sorry it took so long):

                N50 (Mb)
child-trio      33.4 hap1 | 27.6 hap2
child-noTrio    96.7
mother          78.9
father          68.7 

hugocarmaga avatar Sep 28 '22 09:09 hugocarmaga

Hi, I also want to run hifiasm with parental HiFi - hey @chhylp123 any update on if/when it is possible? Or maybe it is out-of-scope for hifiasm? Many thanks for great software!

MagdalenaZZ avatar Mar 18 '24 16:03 MagdalenaZZ

Do you have both parental and child reads?

chhylp123 avatar Mar 21 '24 22:03 chhylp123