jtk icon indicating copy to clipboard operation
jtk copied to clipboard

General Inquiry

Open OZTaekOppa opened this issue 1 year ago • 2 comments

Hello,

I appreciate the excellent program you've provided.

I have a couple of quick questions:

Multiple input reads: Can I utilize both PacBio HiFi and ONT Ultra-long reads simultaneously? Multiple regions and assemblies: Is it possible to use the program for multiple regions across different assemblies? For example, Chromosome 1 (10 Mb regions), Chromosome 2 (10 Mb regions), and Chromosome 3 (10 Mb regions). Or should I submit individual jobs for each chromosomal region?

Any suggestions you can provide would be greatly appreciated.

Regards,

Taek

OZTaekOppa avatar Nov 18 '23 05:11 OZTaekOppa

Thanks for your comments!

Regarding the first question, I would not recommend it. If you have ONT & HiFi, Verkko/hifiasm-ul is the way to go. Regarding the second question, it is theoretically possible, but the accuracy would be improved if you split the reads into the target regions separately.

Here are my excuses;

  1. JTK assumes that there is only one error pattern. In other words, JTK estimates two error profiles, one for each alignment direction (forward or reverse complement). This assumption is useful, but not true even for the ONT reads only situation, because the error rates of the reads are not fixed, but form a distribution. A possible approach is to assume that there are multiple error profiles and iteratively assign the reads to one of the profiles and re-estimate the parameters of these error profiles (as in the usual expectation maximization approach). This approach should, of course, handle the situation where the input consists of both ONT and HiFi reads. However, it is not as simple as I thought, and I am still trying to find a good way to implement the idea. (Any suggestions are welcome)

  2. Speaking of the second question, my concern is that if these regions share exactly the same repeats (e.g. "alive" L1 element), it makes the problem a bit difficult to solve.

ban-m avatar Nov 18 '23 11:11 ban-m

Thank you for responding.

  1. PacBio HiFi & ONT: Sorting this out would indeed be fantastic, but I acknowledge it poses a challenge. I'll need to give it some thought.
  2. Multi-regions: Although it's a challenging task, I find the idea of exploring it quite interesting. Out of curiosity, what's the quality value (e.g., QV50) for JTK assemblies for phased contigs?

Cheers!

OZTaekOppa avatar Nov 19 '23 12:11 OZTaekOppa