GraphBin2
GraphBin2 copied to clipboard
KeyError during "Propagating labels to unlabelled vertices"
The error:
GraphBin2 started
-------------------
Total number of contigs available: 276680
Total number of edges in the assembly graph: 23569
Number of bins available in binning result: 13
Number of binned contigs: 2261
Total number of unbinned contigs: 274419
Number of isolated contigs: 270459
Removing labels of unsupported vertices...
Iteration: 1
100%|███████████████████████████████████████████████████████████| 2261/2261 [00:03<00:00, 669.23it/s]
Iteration: 2
100%|███████████████████████████████████████████████████████████| 2178/2178 [00:02<00:00, 731.72it/s]
Iteration: 3
100%|███████████████████████████████████████████████████████████| 2177/2177 [00:02<00:00, 734.18it/s]
Iteration: 4
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 734.44it/s]
Refining labels of inconsistent vertices...
Iteration: 1
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 733.30it/s]
Iteration: 2
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 770.52it/s]
Iteration: 3
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 771.00it/s]
Obtaining non isolated contigs...
100%|██████████████████████████████████████████████████████| 276680/276680 [00:29<00:00, 9521.30it/s]
Number of non-isolated contigs: 5095
Number of non-isolated unbinned contigs: 2919
Propagating labels to unlabelled vertices...
0%| | 0/2919 [00:00<?, ?it/s]Traceback (most recent call last):
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 617, in <module>
sorted_node_list_ = [list(runBFS(x, threhold=depth)) for x in contigs_to_bin]
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 617, in <listcomp>
sorted_node_list_ = [list(runBFS(x, threhold=depth)) for x in contigs_to_bin]
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 350, in runBFS
labelled_nodes.add((node, active_node, contig_bin, depth[active_node], abs(coverages[contigs_map[node]]-coverages[contigs_map[active_node]])))
KeyError: 276488
0%|
What is the key error referring to? What is the key that is not found?
conda info:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
biopython 1.78 py39hbd71b63_1 conda-forge
ca-certificates 2020.12.5 ha878542_0 conda-forge
cairo 1.16.0 h488836b_1006 conda-forge
certifi 2020.12.5 py39hf3d152e_0 conda-forge
fontconfig 2.13.1 h1056068_1002 conda-forge
freetype 2.10.4 h5ab3b9f_0
gettext 0.19.8.1 h9b4dc7a_1
gmp 6.2.1 h58526e2_0 conda-forge
icu 67.1 he1b5a44_0 conda-forge
ld_impl_linux-64 2.35.1 hed1e6ac_0 conda-forge
libblas 3.9.0 3_openblas conda-forge
libcblas 3.9.0 3_openblas conda-forge
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge
libgfortran-ng 9.3.0 he4bcb1c_17 conda-forge
libgfortran5 9.3.0 he4bcb1c_17 conda-forge
libglib 2.66.3 h1f3bc88_1 conda-forge
libgomp 9.3.0 h5dbcf3e_17 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 3_openblas conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 h2ae2ef3_17 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libxcb 1.14 h7b6447c_0
libxml2 2.9.10 h68273f3_2 conda-forge
ncurses 6.2 he6710b0_1
numpy 1.19.4 py39h57d35e7_1 conda-forge
openssl 1.1.1h h7b6447c_0
pcre 8.44 he6710b0_0
pip 20.3.1 pyhd8ed1ab_0 conda-forge
pixman 0.38.0 h7b6447c_0
pycairo 1.20.0 py39h08627d8_1 conda-forge
python 3.9.0 hdb3f193_2
python-igraph 0.8.3 py39hd24af65_2 conda-forge
python_abi 3.9 1_cp39 conda-forge
readline 8.0 h7b6447c_0
setuptools 50.3.2 py39h06a4308_2
sqlite 3.34.0 h74cdb3f_0 conda-forge
texttable 1.6.3 pyh9f0ad1d_0 conda-forge
tk 8.6.10 hbc83047_0
tqdm 4.54.1 pyhd8ed1ab_0 conda-forge
tzdata 2020d h52ac0ba_0
wheel 0.36.1 pyhd3deb0d_0 conda-forge
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.12 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
I think that the error is due to me using a spades assembly contig fasta in which all sequences <2000bp were removed. I'm guessing that graphbin2 expects all contigs in the *.gfa and *.paths files to be present in the fasta file also. It would help to just have a warning instead of a keyerror, given that many users filtering the contig fasta generated by metaspades, since metaspades has no minimum contig length
Hi @nick-youngblut,
You are correct. GraphBin2 expects all the contigs available in the *.paths to be provided for binning. I will add a fix so users can filter out contigs and still use the original graph. Thank you for pointing this out. I will leave this issue open until I fix it.