meryl icon indicating copy to clipboard operation
meryl copied to clipboard

Q: Use this repo or Canu 1.8 for read-partitioning?

Open SHuang-Broad opened this issue 5 years ago • 5 comments

Hi,

I have a (hopefully) simple question: where should I get meryl for partitioning reads for trio assembly? This repo or the Canu repo?

Background (if that matters): we are interested in not only Canu, but in other recently released long read assemblers as well.

Thank you!

SHuang-Broad avatar Oct 16 '19 19:10 SHuang-Broad

You want to use Canu. If you include the '-haplotype' option Canu will only do the read partitioning.

meryl is (mostly) the same between the two repos; the trio partitioning is implemented as part of canu.

brianwalenz avatar Oct 16 '19 20:10 brianwalenz

Hello @SHuang-Broad, yes, the easiest way I'd recommend is to run canu with -haplotype option. This will stop after obtaining the binned reads of the child.

If you are interested in generating the blobplots, I'd recommend to get the haplotype specific markers. The intermediate meryl db (kmers) used for binning can be found under haplotype/0-kmers/, but needs one more step to filter out erroneous kmers. This step is automatically performed in TrioCanu but needs to be done manually at the moment. Once the haplotyping is done, check the log (splitHaplotype.*.out) to find something similar to this for each haplotypes (example below is the Maternal haplotype):

--  Haplotype './0-kmers/haplotype-Maternal.meryl':
--   use kmers with frequency at least 16.

and run meryl greater-than 16 0-kmers/haplotype-Maternal.meryl output 0-kmers/haplotype-Maternal.gt16.meryl

haplotype-Maternal.gt16.meryl will be the useful 'Maternal' marker for evaluation.

Keep in mind to use the latest canu version not the 1.8 release. meryl in 1.8 release won't be compatible to the scripts on this repo.

arangrhie avatar Oct 16 '19 20:10 arangrhie

Wow. Thank you both for the fast reply!

I am a little confused by the comments though, about the difference between meryl itself in three different places:

  • from canu github latest master
  • from canu 1.8 official release
  • from this repo's latest master

Are they incompatible? It seems the latest masters from the two repos are compatible, correct? In general I prefer to use releases unless there are critical bug fixes (or big performance gains) in the latest master.

Thank you!!!

SHuang-Broad avatar Oct 16 '19 20:10 SHuang-Broad

You can cheat a little bit and get the next Canu release (expected next week) with:

git clone [email protected]:marbl/canu
cd canu
git checkout v1.9

marbl/meryl is intended to be exactly a copy of the meryl that is in marbl/canu. The big difference is that the one in canu can read canu's internal data stores directly. Currently, there are some differences between the two repos, mostly extra features in marbl/meryl.

I'm not sure if the version from 1.8 is compatible with the latest versions. I can't remember if there were file format changes, or if there were, if I added backwards compatibility (I usually do though).

brianwalenz avatar Oct 16 '19 20:10 brianwalenz

There was a bug affecting a few k-mers when generating the k-mer counts, in some edge cases in the 1.8 release that got fixed. I'd also vote for cheating the v1.9 branch.

arangrhie avatar Oct 16 '19 20:10 arangrhie