drep
drep copied to clipboard
Unknown AssertionError
Hey @MrOlm,
I'm running dRep
via snakemake with the following:
Job 0: Running dRep on all NOMIS MAGs
Reason: Forced execution
(date && dRep dereplicate $(dirname /scratch/users/sbusi/nomis_mags/results/Bins/dRep/dereplicated_genomes) \
-p 28 -comp 70 -con 10 \
--genomeInfo /scratch/users/sbusi/nomis_mags/results/Bins/checkmBeforedRep.tsv \
-g /scratch/users/sbusi/nomis_mags/results/renamed_mags/*fa \
--multiround_primary_clustering --run_tertiary_clustering && date) 2> /scratch/users/sbusi/nomis_mags/results/logs/drep/drep.err.log > /scratch/users/sbusi/nomis_mags/results/logs/drep/drep.out.log
Activating conda environment: snakemake_envs/0a2d0324514328db8685fdb0c0b69b98
The log file shows an AssertionError
without any hints similar to other issues previously reported. Please see below:
***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
Running tertiary clustering on genome representatives
Running primary clustering
Running pair-wise MASH clustering
1414 primary clusters made
Running secondary clustering
Running 3137 ANImf comparisons- should take ~ 56.0 min
Step 4. Return output
Loading work directory
Traceback (most recent call last):
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/bin/dRep", line 32, in <module>
Controller().parseArguments(args)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/controller.py", line 100, in parseArguments
self.dereplicate_operation(**vars(args))
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/controller.py", line 48, in dereplicate_operation
drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/d_workflows.py", line 53, in dereplicate_wrapper
drep.d_evaluate.d_evaluate_wrapper(wd, evaluate = '23', **kwargs)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/d_evaluate.py", line 25, in d_evaluate_wrapper
run_tertiary_clustering(wd, **kwargs)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/d_evaluate.py", line 334, in run_tertiary_clustering
drep.d_choose.d_choose_wrapper(wd.location, **kwargs_copy)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/d_choose.py", line 72, in d_choose_wrapper
Gdb = add_centrality(wd, Gdb, **kwargs)
File "/mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/lib/python3.8/site-packages/drep/d_choose.py", line 322, in add_centrality
assert len(ndb) == (mlen * mlen) - mlen
AssertionError
Not sure if it has to do with the length of ndb
. Any thoughts on how to get around this? I'm also attaching the drep.yaml
file which i used to build my environment.
Thank you very much, Susheel
Hi @susheelbhanu
Interesting. A couple of thoughts-
-
Can you confirm that you're on the most up-to-date version of dRep? I remember this bug from a previous version but I thought I fixed it.
-
Can you confirm that all dependencies are properly installed?
dRep check_dependencies
-
If neither of those work, I believe setting the
centrality
score to 0 should be a successful workaround
-Matt
Hey @MrOlm,
Thanks much for the quick response. Please see the answers to your question below
- The version I'm using is:
drep==3.2.2
...::: dRep v3.2.2 :::...
Matt Olm. MIT License. Banfield Lab, UC Berkeley. 2017 (last updated 2020)
See https://drep.readthedocs.io/en/latest/index.html for documentation
Choose one of the operations below for more detailed help.
Example: dRep dereplicate -h
Commands:
compare -> Compare and cluster a set of genomes
dereplicate -> De-replicate a set of genomes
check_dependencies -> Check which dependencies are properly installed
- For
check_dependencies
. I have the following:
mash.................................... all good (location = /mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/bin/mash)
nucmer.................................. all good (location = /mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/bin/nucmer)
checkm.................................. !!! ERROR !!! (location = None)
ANIcalculator........................... !!! ERROR !!! (location = None)
prodigal................................ all good (location = /mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/bin/prodigal)
centrifuge.............................. !!! ERROR !!! (location = None)
nsimscan................................ !!! ERROR !!! (location = None)
fastANI................................. all good (location = /mnt/lscratch/users/sbusi/SnakemakeBinning/snakemake_envs/0a2d0324514328db8685fdb0c0b69b98/bin/fastANI)
I did not install checkM
since I was providing the quality metrics, and skipped centrifuge and simscan
. It does however look like ANIcalculator
which is "recommended
" may in fact be essential for running the clustering
steps. See my note below.
- How does one set the centrality score? Is it a flag I can use?
Note
: By reducing the number of MAGs to less than 5000 and removing these two flags --multiround_primary_clustering --run_tertiary_clustering
from the original shell command, everything ran just fine.
Thanks again for your help. I will put in the ANIcalculator and see if that resolves it with the original full set of MAGs. -Susheel
Hi Susheel,
No need to install any of the other dependencies- those aren't needed for what you're running.
The two recommendations I have are to update to the latest version of dRep (v3.4.0), though I'm not sure this will fix the problem, and add the flag -centW 0
to remove the centrality scoring. If neither of those fix the problem we can go from there.
Best, Matt
Thanks Matt. Is v3.4.0
on conda or pip?
Will give this a go and get back. May take a bit though before I reply again.