Clarification on --disentangle-df: How is the depth factor used in filtering?
First check
- [ ✅] I used the GitHub search to find a similar issue or discussion and didn't find it.
- [✅ ] I searched GetOrganelle.wiki context, especially the FAQ and browsed the examples to confirm it is unexpected to happen.
- [ ✅] I have updated GetOrganelle to the latest released version
Please ask questions in the Question in GitHub Discussions unless it is a feature request or bug report.
Hi, I'm currently using GetOrganelle to assemble a parasitic flatworm mitochondrial genome using a customized label database. I would like to ask about the internal implementation and role of the --disentangle-df option.
I understand from the paper that GetOrganelle filters out contigs with coverage values that deviate significantly from the target-anchor contigs. However, in the source code (get_organelle_from_reads.py, assembly_parser.py), I saw that --disentangle-df is passed as hard_cov_threshold or min_cov_folds, and used to remove low-coverage contigs.
My questions are:
How exactly is --disentangle-df applied in filtering? (e.g., is it based on the coverage of the top-weighted anchor contig?)
Is the filtering applied to both low-coverage and high-coverage outliers?
Is this value used in any way during Gaussian Mixture Model steps?
For animal mitogenomes (especially parasitic species), do you recommend adjusting this parameter to a lower value (e.g., 3~5)?
Thanks for your excellent tool, and I’d appreciate any clarification!