ExpansionHunter
ExpansionHunter copied to clipboard
Advanced options
Hi Egor
What is the difference between the dag-aligner and path-aligner and the seeking and streaming analysis modes under the advanced options? I would like to understand when it may be useful to specify these options.
Thanks, Melissa
Hi Melissa,
Thanks for the question! The advanced options correspond to settings with very narrow use cases that are mainly useful for testing the program.
The --aligner
option selects the version of read alignment algorithm used by the program. dag-aligner
is a better aligner than path-aligner
because it is faster and supports affine gap penalties.
The --analysis-mode
option determines if ExpansionHunter should analyze repeat regions sequentially (seeking
) or all at once (streaming
). streaming
mode can significantly speedup analysis of large repeat catalogs, but it also requires a lot of memory.
I hope this helps! Please let me know if you have any other questions. Egor
PS: In case you'd like to try it out, we just released a new visualization tool for STRs.
Hello Egor,
Is there any difference in accuracy between --analysis-mode (seeking) Vs (streaming)? And also, when you mentioned a lot of memory, could you give me an example between the two in terms of CPU/RAM specifics and running time with N repeats X N individuals? Thank you
Thanks for the questions! There is no difference in accuracy between seeking and streaming analysis modes. And I will get back to you about the latest runtime/memory benchmarks. But overall, the streaming mode is currently only practical for up to 10-20K repeats.
I am working with @yjqiu, @felixschlesinger, and @kscheffler on significantly reducing the memory requirements of the streaming mode. @yjqiu is also getting ready to release a fairly large repeat catalog. All this should be done in 2-3 months and possibly much sooner.
Also, if you'd like to discuss the analysis you are planning to do, please feel free to reach out by email.
Hello Egor,
This is excellent. Thank you very much for any further information you can give me. I am also trying to use the streaming mode but I have this error. Could you please let me know what I am doing wrong? Thank you again.
2021-02-25T12:06:12,[Starting ExpansionHunter v4.0.1] 2021-02-25T12:06:12,[Analyzing sample HG00479.final] 2021-02-25T12:06:12,[Initializing reference GRCh38_full_analysis_set_plus_decoy_hla.fa] 2021-02-25T12:06:12,[Loading variant catalog from disk TEMPLATE2_a.json] 2021-02-25T12:06:14,[Running sample analysis in streaming mode] Failed to populate reference for id 0 Unable to fetch reference #0 9996..29231 Failure to decode slice 2021-02-25T12:06:19,[Failed to extract a record from HG00479.final.cram]
Hello William,
Sorry about the error. It looks like there is a CRAM parsing bug in the streaming mode. Could you please check if this Linux binary works? (Let me know if you need a binary for a different platform.)
Dear Egor,
The binary works 100% and it runs smooth and fast. From what I can tell it takes a total of 23Gb RAM. It engages only 1 thread of a cpu but I can tell that the program runs on 100% CPU idled.
Many Thanks for your help!
Glad to hear it, William! And thanks for checking the memory usage. We will work on reducing memory consumption in the future releases.
Good day. Could some of you please clarify --analysis-mode is needed just because of the big sizes of variant catalogs? or is it possible to use this option in terms of several in-files' usage? I mean .bam s. Thank you.
Thanks for the question. That's right, "streaming" analysis mode is meant for analysis of larger variant catalogs. It has no other purpose. In the future versions of the program the streaming mode will be much more efficient making it possible to analyze large catalogs.
Big thanks and good luck!