MetaCoAG
MetaCoAG copied to clipboard
KeyError contig_4488
Hello Vini,
I got a have been using MetaCoAG for a while, works well most of the time until I got a KeyError: contig_4488. The dataset Ive been working on is ONT, assembled on metaFlye.
This contig_4488 is not present in my flye assembly. An edge_4488 was present in the graph assembly though (Could this be the issue?).
grep -w "contig_4488\|edge_4488" /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:S edge_4488 GGCATGACGCCCAGTACCACCACGTACGGGACAGGCATCAATAGCAACACGGGCCTCGGCGCTACTAACAATGGCAATGCCGGCACGACACCCGGCACCGGCGTCTCCGGGGCCGGCAGCAGCGGCGCGACGATGGGCACCAGCGGCACCACAGGCCTCGGCAGTACCTACAATGGCACCACCGGTACGACGCTCGGCACCGGCACCGGTACAACCGGCGTCGGCGCCAATGGCCTCGGCACCGGCGGCGCCACGGGCCTCGGCGGCACCGACAACGGCGCCACCGGCGCGACGCCGGGCACCGGCGGCACTGGAGCGGGGACCGGCGGCACTGGCGGTACTGGCGGCCGGTAAGGCACCCGGAGTACGCCGCTAACGGCGACGGGCGGGGCGAGGAGGCGTCACCCCGTCCGTCGGCGCGCCCGCGCCGGAAAGCTGACCCGTTCCTCGATGCCGGCCGGGTCCTCGTGAATGATGATCTCGGCGTGCGGAAAGGCGCGCTGCAGCTGCGCCTCGACCGCGTCGGAAATCTGGTGCGCGCGCGACAGGCTCATCGCGCCGTCCATCTCGATATGCAGCTGAATAAACGCGGTCGGCCCGGCGATGCGGGTGCGGATGTCATGCACCGCGGTGACTTCGGGATGGCTTTCGGCGATCGCGCGGACCCGGGCGCGCTCCGAATCGGGCAATTCGCGGTCCATCAGCTGGGTCAGCGACAATCGCGCGATCTTGAATGCCCCGCGGATGAGCCACAGCCCGACCGCAGCGCCGAACAGCGGGTCGAGCAGCGGCATCGGAAAGGAGCTGCCGATCGCCAGCGTCGCGATGACGCCGAGGTTCAGGATCAGGTCGCCGCGATAGTGCAATTCATCGGCGCCGATCGCCAACGAGCCGGTGCGTTTGACGACGTAGCGCTGGTAGAGAACCAGGCCGAGCGTCATGGCGATCGCCACCAGCATGACCGCGATCCCCGCCGGCGGGTGCGCCACCGGGCGCGGCTCGGCCAGGCGGCGGATCGCCTCGAACATCAACAAGGCAGCGCTGCCGACGAGAAAGGCGGACTGGGCGAGCGCCGCCAACGGCTCGGCCTTGCCGTGGCCGAAGCGGTGCTGGCGGTCGGGCGGCGTCGCGGCGCGCCGCACGGCGAACAGATTGACCAGCGAGGCGACGGCATCGACCAGCGAATCGACGAGGCTCGACAACAGGGCGACCGAGCCGGTGCCGATCCAGGCGGCGAGCTTGGCGACAATCAGCACCGTCGCGATCGCCAGCGAGGCGGCGGTCGCGCGCCGCCGCAGCATCTGCGCGGCGCCGCGCTCGCTCGTTACCTCGCTCACGGATAGAGGCGCTGTTTGCGCCATCCCTCGCCGTCGCGGACGAACGCCACGCGGTCGTGCAGACGGAACGGCCGCTCCTGCCAAAACTCGACGCTGTCCGGCCATATCCGAAAACCCGACCAGTAGGCGGGTCGCGGCACGGCGGGTTGCTCGGCATAGCGCTGCGAGTACAGCGCGAAGCGGCGCTCCAGCTCGGCGCGCTCGGCGAGCGGGCGCGACTGGTCGGAGGCCCAGGCGCCGATCTGGCTGTCGCGCGGCCGGGTCGCGAAATAGGCGTCGGCCTCGGCCGGCGAGACCGCTCTCGCCTCGCCCTCGATGCGCACCTGGCGGGCCAGCGACTTCCAGTAGAGGCACAGCGCGGCCCGCGGATTGGCCGCCAGCTCCGCGCCCTTGCGGCTGTCGAGATTGGTGTAAAACACGAAGCCGCGCTGGTCGGCGCCCTTGAGCAGCACCGCGCGCAACGACGGCCGCCCGTCCGCTGTCGCGGTCGCCAGCATCGTCGCCTCGGGGATCGGCTCGCACTGCGCGGCCAGCGCGAACCAGCGCGCGAACGGCGCGAACGGTTCGTTCTCGGCGATCTCGTCGGTCATTGCGTGAGGTGGCTCCGCTTTGGTTGTGCGCGCCGGAGCCTTCCCTACTCCGCCCCGCGATCCTCGGCAACCGCCCTGCTCGACACGATCGCGGCCGCCGGCGCCGAAGAAGGGCCGCGGCCGCGGATCTCCGCCAGCAGCGCCGCCAAGGTCACTCGCATCGCCGCCGCCTCGGCCTTGACGATCCGCTCCATCGCCGGCGCGACCTGGCGCTGCCACGACGCCAGCGGCCGCGCCAGCCAGCTGCCGGCGAGCGGCAGACCGAGCGCCAGATAAAGATCGTGCAGCGTCGCGCTATGCGGGTCCCAGGCGAGCACCCAGGCGCCGTCCTGGGTCGGCGCGGTGAACCCGGCCTCGGCGAGGATCTGCAGATGCTCGTCGGCGACCGAGGTCGGCACGCCGAGTTCGCTCGCCAGCATCGCGGTGCGGCAGCGCAGGCCGTGCTGCTGCGCCCGCGCCAGCGCGGCAATCAGCGCCAGCGCGAAACCGAGCCTCACGCCGCCGCTGCTCAGATGCGACAATCGCTCATCGACCCGCCAGGTCGGCAGGTTGGCGGCGACCACGGCGCCGAGCAATACCGCATTCCAGGTGACGTACATCCACAACAGAAAGATCGGGATCGCCGCGAGCGCGCCATAGACGGTCTGATAGAACGACGAGGCGGCGATGTAGATGGAAAATCCAACCTTCAGGATCTCGATGGCGGCCGCGGCGACCGCGGCGCCGAGGAGGCCGTCGCGCCAGCGCACCGCACAATTCGGAATGAGGCAATAGAGCAGTGTGCAGGCGATCAACTCCAACACGAACGGGACAAGGCGCGCGACGACATGCGGCCAGCCGCTCGTCAGCTCCGTCACCAGCGCCGGGTTGAGGCCGGCATGGCGGGCCGCCGTGTCGAGATAGGTCGACAGGGTCAGGCTCATGCCGACCAGCAGCGGGCCCAACGTGATCAGCGTCCAATAGGCGAGCACCCGCTGCACCCAGGGCCGCGGCGTCGTGACCCGCCACAGCGCATTGAGGCGGTCCTCGACCGTAACCAGCAGCAGGACGCCGGTGGCGGCGATGCCGACGAGACCGATCGCGGTCGCCTGCGCCGCCGAACCGGCGAAATACTGGAACCACTGCGCCGCCTGCTCGCTGATCGCCGGCACGAAATTACGAAACAACAGCGCCGGCAGGTCCTGCCGCGCCGGCGCGAAACTCGGGAAGACCGACAGGACGCCGAGCCCGACGACGCCAAGCGGCACCAGCGACACCAGGGTCGTGTAGCTGAGCGCGCCCGAGGCGGCAAAGCAGCCGTCATGGTTGAACCGGTGCAGCGCATAGCGGCAGAAGGTCAGCACCGCCCTGAGCCGGCGGCGCAGCACGCCGTGGCCAGAGTCTCGGCGGCTGAACTTGGCGCGGCCGGGCGACGGAGGACCGCGATGTCG dp:i:32
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4485 - edge_4488 + 0M RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4487 - edge_4488 + 0M RC:i:5
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4488 + edge_265255 + 0M RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4488 + edge_265254 + 0M RC:i:16
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4488 - edge_265252 - 0M RC:i:2
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4488 - edge_100100 - 0M RC:i:13
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L edge_4488 - edge_265253 + 0M RC:i:40
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P contig_4485 edge_265255-,edge_4488-,edge_4485+,edge_8112- *
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P contig_4487 edge_265254-,edge_4488-,edge_4487+ *
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P contig_68975 edge_4488-,edge_265253+,edge_24317-,edge_68975+ *
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P contig_100100 edge_277711+,edge_100100+,edge_4488+,edge_265254+ *
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P contig_265252 edge_4474-,edge_265252+,edge_4488+ *
As you can see, "contig_4488" is supposedly not present in any of the input files given to MetaCoAG.
The command executed:
metacoag --assembler flye \
--graph /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa \
--contigs /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta \
--paths /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt \
--abundance /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv \
--output $outdir &> /fs03/jm41/Zarul/C002_D1_results/log/metacoag_medaka/log.log
Here is the error message:
2024-03-27 02:39:34,410 - INFO - Welcome to MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs.
2024-03-27 02:39:34,429 - INFO - Input arguments:
2024-03-27 02:39:34,430 - INFO - Assembler used: flye
2024-03-27 02:39:34,430 - INFO - Contigs file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta
2024-03-27 02:39:34,430 - INFO - Assembly graph file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa
2024-03-27 02:39:34,430 - INFO - Contig paths file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt
2024-03-27 02:39:34,430 - INFO - Abundance file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv
2024-03-27 02:39:34,430 - INFO - Final binning output file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag
2024-03-27 02:39:34,430 - INFO - Marker gene file hmm: auxiliary/marker.hmm
2024-03-27 02:39:34,430 - INFO - Minimum length of contigs to consider: 1000
2024-03-27 02:39:34,430 - INFO - Depth to consider for label propagation: 10
2024-03-27 02:39:34,431 - INFO - p_intra: 0.1
2024-03-27 02:39:34,431 - INFO - p_inter: 0.01
2024-03-27 02:39:34,431 - INFO - Do not use --cut_tc: False
2024-03-27 02:39:34,431 - INFO - mg_threshold: 0.5
2024-03-27 02:39:34,431 - INFO - bin_mg_threshold: 0.33333
2024-03-27 02:39:34,431 - INFO - min_bin_size: 200000 base pairs
2024-03-27 02:39:34,431 - INFO - d_limit: 20
2024-03-27 02:39:34,431 - INFO - Number of threads: 8
2024-03-27 02:39:34,431 - INFO - MetaCoAG started
2024-03-27 02:39:53,232 - INFO - Total number of contigs available: 269678
2024-03-27 02:39:58,801 - INFO - Total number of edges in the assembly graph: 77552
2024-03-27 02:39:58,928 - INFO - Total isolated contigs in the assembly graph: 244283
2024-03-27 02:39:58,929 - INFO - Obtaining lengths and coverage values of contigs
2024-03-27 02:40:18,190 - INFO - Total long contigs: 267613
2024-03-27 02:40:18,190 - INFO - Total isolated long contigs in the assembly graph: 243244
2024-03-27 02:40:18,191 - INFO - Obtaining tetranucleotide frequencies of contigs
2024-03-27 02:47:08,567 - INFO - Scanning for single-copy marker genes
2024-03-27 02:47:08,636 - INFO - .hmmout file already exists
2024-03-27 02:47:08,636 - INFO - Obtaining contigs with single-copy marker genes
Traceback (most recent call last):
File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 1260, in <module>
main()
File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 613, in main
) = marker_gene_utils.get_contigs_with_marker_genes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/metacoag_utils/marker_gene_utils.py", line 147, in get_contigs_with_marker_genes
contig_num = contig_names_rev[contig_name]
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'contig_4488'
Thank you :pray:
Hi @ZarulHanifah,
Sorry about the delay in getting back to you. Were you able to sort out this error?
Thanks, Vijini
I havent been able to sort this out. I'm preparing a GDrive link with the relevant files...