MetaCoAG icon indicating copy to clipboard operation
MetaCoAG copied to clipboard

KeyError contig_4488

Open ZarulHanifah opened this issue 1 year ago • 2 comments

Hello Vini,

I got a have been using MetaCoAG for a while, works well most of the time until I got a KeyError: contig_4488. The dataset Ive been working on is ONT, assembled on metaFlye.

This contig_4488 is not present in my flye assembly. An edge_4488 was present in the graph assembly though (Could this be the issue?).

grep -w "contig_4488\|edge_4488" /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag 
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:S	edge_4488	GGCATGACGCCCAGTACCACCACGTACGGGACAGGCATCAATAGCAACACGGGCCTCGGCGCTACTAACAATGGCAATGCCGGCACGACACCCGGCACCGGCGTCTCCGGGGCCGGCAGCAGCGGCGCGACGATGGGCACCAGCGGCACCACAGGCCTCGGCAGTACCTACAATGGCACCACCGGTACGACGCTCGGCACCGGCACCGGTACAACCGGCGTCGGCGCCAATGGCCTCGGCACCGGCGGCGCCACGGGCCTCGGCGGCACCGACAACGGCGCCACCGGCGCGACGCCGGGCACCGGCGGCACTGGAGCGGGGACCGGCGGCACTGGCGGTACTGGCGGCCGGTAAGGCACCCGGAGTACGCCGCTAACGGCGACGGGCGGGGCGAGGAGGCGTCACCCCGTCCGTCGGCGCGCCCGCGCCGGAAAGCTGACCCGTTCCTCGATGCCGGCCGGGTCCTCGTGAATGATGATCTCGGCGTGCGGAAAGGCGCGCTGCAGCTGCGCCTCGACCGCGTCGGAAATCTGGTGCGCGCGCGACAGGCTCATCGCGCCGTCCATCTCGATATGCAGCTGAATAAACGCGGTCGGCCCGGCGATGCGGGTGCGGATGTCATGCACCGCGGTGACTTCGGGATGGCTTTCGGCGATCGCGCGGACCCGGGCGCGCTCCGAATCGGGCAATTCGCGGTCCATCAGCTGGGTCAGCGACAATCGCGCGATCTTGAATGCCCCGCGGATGAGCCACAGCCCGACCGCAGCGCCGAACAGCGGGTCGAGCAGCGGCATCGGAAAGGAGCTGCCGATCGCCAGCGTCGCGATGACGCCGAGGTTCAGGATCAGGTCGCCGCGATAGTGCAATTCATCGGCGCCGATCGCCAACGAGCCGGTGCGTTTGACGACGTAGCGCTGGTAGAGAACCAGGCCGAGCGTCATGGCGATCGCCACCAGCATGACCGCGATCCCCGCCGGCGGGTGCGCCACCGGGCGCGGCTCGGCCAGGCGGCGGATCGCCTCGAACATCAACAAGGCAGCGCTGCCGACGAGAAAGGCGGACTGGGCGAGCGCCGCCAACGGCTCGGCCTTGCCGTGGCCGAAGCGGTGCTGGCGGTCGGGCGGCGTCGCGGCGCGCCGCACGGCGAACAGATTGACCAGCGAGGCGACGGCATCGACCAGCGAATCGACGAGGCTCGACAACAGGGCGACCGAGCCGGTGCCGATCCAGGCGGCGAGCTTGGCGACAATCAGCACCGTCGCGATCGCCAGCGAGGCGGCGGTCGCGCGCCGCCGCAGCATCTGCGCGGCGCCGCGCTCGCTCGTTACCTCGCTCACGGATAGAGGCGCTGTTTGCGCCATCCCTCGCCGTCGCGGACGAACGCCACGCGGTCGTGCAGACGGAACGGCCGCTCCTGCCAAAACTCGACGCTGTCCGGCCATATCCGAAAACCCGACCAGTAGGCGGGTCGCGGCACGGCGGGTTGCTCGGCATAGCGCTGCGAGTACAGCGCGAAGCGGCGCTCCAGCTCGGCGCGCTCGGCGAGCGGGCGCGACTGGTCGGAGGCCCAGGCGCCGATCTGGCTGTCGCGCGGCCGGGTCGCGAAATAGGCGTCGGCCTCGGCCGGCGAGACCGCTCTCGCCTCGCCCTCGATGCGCACCTGGCGGGCCAGCGACTTCCAGTAGAGGCACAGCGCGGCCCGCGGATTGGCCGCCAGCTCCGCGCCCTTGCGGCTGTCGAGATTGGTGTAAAACACGAAGCCGCGCTGGTCGGCGCCCTTGAGCAGCACCGCGCGCAACGACGGCCGCCCGTCCGCTGTCGCGGTCGCCAGCATCGTCGCCTCGGGGATCGGCTCGCACTGCGCGGCCAGCGCGAACCAGCGCGCGAACGGCGCGAACGGTTCGTTCTCGGCGATCTCGTCGGTCATTGCGTGAGGTGGCTCCGCTTTGGTTGTGCGCGCCGGAGCCTTCCCTACTCCGCCCCGCGATCCTCGGCAACCGCCCTGCTCGACACGATCGCGGCCGCCGGCGCCGAAGAAGGGCCGCGGCCGCGGATCTCCGCCAGCAGCGCCGCCAAGGTCACTCGCATCGCCGCCGCCTCGGCCTTGACGATCCGCTCCATCGCCGGCGCGACCTGGCGCTGCCACGACGCCAGCGGCCGCGCCAGCCAGCTGCCGGCGAGCGGCAGACCGAGCGCCAGATAAAGATCGTGCAGCGTCGCGCTATGCGGGTCCCAGGCGAGCACCCAGGCGCCGTCCTGGGTCGGCGCGGTGAACCCGGCCTCGGCGAGGATCTGCAGATGCTCGTCGGCGACCGAGGTCGGCACGCCGAGTTCGCTCGCCAGCATCGCGGTGCGGCAGCGCAGGCCGTGCTGCTGCGCCCGCGCCAGCGCGGCAATCAGCGCCAGCGCGAAACCGAGCCTCACGCCGCCGCTGCTCAGATGCGACAATCGCTCATCGACCCGCCAGGTCGGCAGGTTGGCGGCGACCACGGCGCCGAGCAATACCGCATTCCAGGTGACGTACATCCACAACAGAAAGATCGGGATCGCCGCGAGCGCGCCATAGACGGTCTGATAGAACGACGAGGCGGCGATGTAGATGGAAAATCCAACCTTCAGGATCTCGATGGCGGCCGCGGCGACCGCGGCGCCGAGGAGGCCGTCGCGCCAGCGCACCGCACAATTCGGAATGAGGCAATAGAGCAGTGTGCAGGCGATCAACTCCAACACGAACGGGACAAGGCGCGCGACGACATGCGGCCAGCCGCTCGTCAGCTCCGTCACCAGCGCCGGGTTGAGGCCGGCATGGCGGGCCGCCGTGTCGAGATAGGTCGACAGGGTCAGGCTCATGCCGACCAGCAGCGGGCCCAACGTGATCAGCGTCCAATAGGCGAGCACCCGCTGCACCCAGGGCCGCGGCGTCGTGACCCGCCACAGCGCATTGAGGCGGTCCTCGACCGTAACCAGCAGCAGGACGCCGGTGGCGGCGATGCCGACGAGACCGATCGCGGTCGCCTGCGCCGCCGAACCGGCGAAATACTGGAACCACTGCGCCGCCTGCTCGCTGATCGCCGGCACGAAATTACGAAACAACAGCGCCGGCAGGTCCTGCCGCGCCGGCGCGAAACTCGGGAAGACCGACAGGACGCCGAGCCCGACGACGCCAAGCGGCACCAGCGACACCAGGGTCGTGTAGCTGAGCGCGCCCGAGGCGGCAAAGCAGCCGTCATGGTTGAACCGGTGCAGCGCATAGCGGCAGAAGGTCAGCACCGCCCTGAGCCGGCGGCGCAGCACGCCGTGGCCAGAGTCTCGGCGGCTGAACTTGGCGCGGCCGGGCGACGGAGGACCGCGATGTCG	dp:i:32
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4485	-	edge_4488	+	0M	RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4487	-	edge_4488	+	0M	RC:i:5
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	+	edge_265255	+	0M	RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	+	edge_265254	+	0M	RC:i:16
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_265252	-	0M	RC:i:2
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_100100	-	0M	RC:i:13
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_265253	+	0M	RC:i:40
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_4485	edge_265255-,edge_4488-,edge_4485+,edge_8112-	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_4487	edge_265254-,edge_4488-,edge_4487+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_68975	edge_4488-,edge_265253+,edge_24317-,edge_68975+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_100100	edge_277711+,edge_100100+,edge_4488+,edge_265254+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_265252	edge_4474-,edge_265252+,edge_4488+	*

As you can see, "contig_4488" is supposedly not present in any of the input files given to MetaCoAG.

The command executed:

metacoag --assembler flye \
    --graph /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa \
    --contigs /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta \
    --paths /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt \
    --abundance /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv \
    --output $outdir &> /fs03/jm41/Zarul/C002_D1_results/log/metacoag_medaka/log.log

Here is the error message:

2024-03-27 02:39:34,410 - INFO - Welcome to MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs.
2024-03-27 02:39:34,429 - INFO - Input arguments: 
2024-03-27 02:39:34,430 - INFO - Assembler used: flye
2024-03-27 02:39:34,430 - INFO - Contigs file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta
2024-03-27 02:39:34,430 - INFO - Assembly graph file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa
2024-03-27 02:39:34,430 - INFO - Contig paths file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt
2024-03-27 02:39:34,430 - INFO - Abundance file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv
2024-03-27 02:39:34,430 - INFO - Final binning output file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag
2024-03-27 02:39:34,430 - INFO - Marker gene file hmm: auxiliary/marker.hmm
2024-03-27 02:39:34,430 - INFO - Minimum length of contigs to consider: 1000
2024-03-27 02:39:34,430 - INFO - Depth to consider for label propagation: 10
2024-03-27 02:39:34,431 - INFO - p_intra: 0.1
2024-03-27 02:39:34,431 - INFO - p_inter: 0.01
2024-03-27 02:39:34,431 - INFO - Do not use --cut_tc: False
2024-03-27 02:39:34,431 - INFO - mg_threshold: 0.5
2024-03-27 02:39:34,431 - INFO - bin_mg_threshold: 0.33333
2024-03-27 02:39:34,431 - INFO - min_bin_size: 200000 base pairs
2024-03-27 02:39:34,431 - INFO - d_limit: 20
2024-03-27 02:39:34,431 - INFO - Number of threads: 8
2024-03-27 02:39:34,431 - INFO - MetaCoAG started
2024-03-27 02:39:53,232 - INFO - Total number of contigs available: 269678
2024-03-27 02:39:58,801 - INFO - Total number of edges in the assembly graph: 77552
2024-03-27 02:39:58,928 - INFO - Total isolated contigs in the assembly graph: 244283
2024-03-27 02:39:58,929 - INFO - Obtaining lengths and coverage values of contigs
2024-03-27 02:40:18,190 - INFO - Total long contigs: 267613
2024-03-27 02:40:18,190 - INFO - Total isolated long contigs in the assembly graph: 243244
2024-03-27 02:40:18,191 - INFO - Obtaining tetranucleotide frequencies of contigs
2024-03-27 02:47:08,567 - INFO - Scanning for single-copy marker genes
2024-03-27 02:47:08,636 - INFO - .hmmout file already exists
2024-03-27 02:47:08,636 - INFO - Obtaining contigs with single-copy marker genes
Traceback (most recent call last):
  File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 1260, in <module>
    main()
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 613, in main
    ) = marker_gene_utils.get_contigs_with_marker_genes(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/metacoag_utils/marker_gene_utils.py", line 147, in get_contigs_with_marker_genes
    contig_num = contig_names_rev[contig_name]
                 ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'contig_4488'

Thank you :pray:

ZarulHanifah avatar Mar 26 '24 17:03 ZarulHanifah

Hi @ZarulHanifah,

Sorry about the delay in getting back to you. Were you able to sort out this error?

Thanks, Vijini

Vini2 avatar Jun 07 '24 06:06 Vini2

I havent been able to sort this out. I'm preparing a GDrive link with the relevant files...

ZarulHanifah avatar Jun 09 '24 18:06 ZarulHanifah