meryl
meryl copied to clipboard
Q: Use this repo or Canu 1.8 for read-partitioning?
Hi,
I have a (hopefully) simple question: where should I get meryl
for partitioning reads for trio assembly? This repo or the Canu repo?
Background (if that matters): we are interested in not only Canu, but in other recently released long read assemblers as well.
Thank you!
You want to use Canu. If you include the '-haplotype' option Canu will only do the read partitioning.
meryl is (mostly) the same between the two repos; the trio partitioning is implemented as part of canu.
Hello @SHuang-Broad, yes, the easiest way I'd recommend is to run canu with -haplotype
option. This will stop after obtaining the binned reads of the child.
If you are interested in generating the blobplots, I'd recommend to get the haplotype specific markers.
The intermediate meryl db (kmers) used for binning can be found under haplotype/0-kmers/
, but needs one more step to filter out erroneous kmers. This step is automatically performed in TrioCanu but needs to be done manually at the moment.
Once the haplotyping is done, check the log (splitHaplotype.*.out
) to find something similar to this for each haplotypes (example below is the Maternal haplotype):
-- Haplotype './0-kmers/haplotype-Maternal.meryl':
-- use kmers with frequency at least 16.
and run meryl greater-than 16 0-kmers/haplotype-Maternal.meryl output 0-kmers/haplotype-Maternal.gt16.meryl
haplotype-Maternal.gt16.meryl
will be the useful 'Maternal' marker for evaluation.
Keep in mind to use the latest canu version not the 1.8 release. meryl in 1.8 release won't be compatible to the scripts on this repo.
Wow. Thank you both for the fast reply!
I am a little confused by the comments though, about the difference between meryl
itself in three different places:
- from canu github latest master
- from canu 1.8 official release
- from this repo's latest master
Are they incompatible? It seems the latest masters from the two repos are compatible, correct? In general I prefer to use releases unless there are critical bug fixes (or big performance gains) in the latest master.
Thank you!!!
You can cheat a little bit and get the next Canu release (expected next week) with:
git clone [email protected]:marbl/canu
cd canu
git checkout v1.9
marbl/meryl is intended to be exactly a copy of the meryl that is in marbl/canu. The big difference is that the one in canu can read canu's internal data stores directly. Currently, there are some differences between the two repos, mostly extra features in marbl/meryl.
I'm not sure if the version from 1.8 is compatible with the latest versions. I can't remember if there were file format changes, or if there were, if I added backwards compatibility (I usually do though).
There was a bug affecting a few k-mers when generating the k-mer counts, in some edge cases in the 1.8 release that got fixed. I'd also vote for cheating the v1.9 branch.