GraphBin2
GraphBin2 copied to clipboard
AttributeError: 'NoneType' object has no attribute 'group'
Hi, Thanks for developing this tool. I met a problem when running the graphbin2. Below is my pipeline and the error i got:
flye --meta --nano-hq barcode05-trimmed-2000bp.fastq --genome-size 4.3m --out-dir flye05 --threads 16
perl /programs/MaxBin-2.2.4/run_MaxBin.pl -contig flye05/assembly.fasta -abund flye05/assembly_info.txt -thread 16 -out Sample05
mkdir Sample05
mv Sample05.* Sample05
conda activate graphbin2
python GraphBin2/support/prepResult.py --binned flye05/MaxBin2 --output flye05/MaxBin2
python GraphBin2/graphbin2 --assembler flye --contigs flye05/assembly.fasta --abundance flye05/assembly_info.txt --graph flye05/assembly_graph.gfa --binned flye05/Sample05/initial_contig_bins.csv --output flye05/graphbin2 --nthreads 8
The flye&maxbin2 work alright. The log of the graphbin2 is:
2022-02-13 11:36:59,497 - INFO - Existing binning output file: flye05/Sample05/initial_contig_bins.csv 2022-02-13 11:36:59,497 - INFO - Final binning output file: flye05/graphbin2 2022-02-13 11:36:59,498 - INFO - Depth: 5 2022-02-13 11:36:59,498 - INFO - Threshold: 1.5 2022-02-13 11:36:59,498 - INFO - Number of threads: 8 2022-02-13 11:36:59,498 - INFO - GraphBin2 started Traceback (most recent call last): File "GraphBin2/src/graphbin2_Flye.py", line 97, in <module> contig_num = int(re.search('%s(.*)%s' % (start_n, end_n), record.id).group(1))-1 AttributeError: 'NoneType' object has no attribute 'group'
Any hint on solving this problem? Thank you very much. Best, Nan
Hello @wn835166087!
Thanks for posting this issue. From your pipeline, it seems that you are using the original contig file assembly.fasta from the Flye output which is not supported at the moment. As described in the section Before using Flye assemblies for binning, you need to get the edge sequences in the assembly graph and use that as input for binning.
I'm working on adding support to bin the original assembly.fasta file. I will release an updated version soon.
Let me know if you have any further questions.
Thank you!
thank you so much for your reply! However, I'm wondering how i get corresponding Abundance file? the direct output from flye is the format of
#seq_name length cov. circ. repeat mult. alt_group graph_path contig_143 187871 33 N N 3 * -195,143,994
I assume i need a similar file but the seq_name should be edge_1. (if i directly use the output from flye assembly_info.txt, the error will be on the line 113 of graphbin2_Flye.py)
Hello @wn835166087,
You can refer to Before using Flye/Miniasm assemblies for binning from the GraphBin documentation. I have provided a script named flye_miniasm_gfa2fasta.py which can generate the edges.fasta file.
Once you get the edges.fasta file, you have to map the reads back to the sequences in the edges.fasta file and calculate the coverage values. You can use the tool CoverM to get the coverage values.
Let me know if you have any further questions.
Thank you!
Thanks. I used CoverM to get the coverage.
I got abundance.txt in the format of
edge_961 11.233458\n
So for graphbin2_Flye.py, on line 105, I modified line = my_file.readline() as line = my_file.readline().rstrip('\n')
on line 115, i modified int(strings[1]) to int(float(strings[1]))
But now I'm totally confused. I still got error on line 249 (i quoted the try...; except ... as it exit the program)
Traceback (most recent call last): File "/home/nw323/GraphBin2/src/graphbin2_Flye.py", line 249, in <module> contig_num = contig_names_rev[row[0]] KeyError: 'contig_1'
I prepared the binned input using the prepResult.py, from which I got file like
contig_1,1
contig_10,1
which is original contig from the Flye output. But line 249 seems to expect the edge_XXX.
Any suggestions?
Hi @wn835166087,
GraphBin2 has been updated to handle original flye contigs. Feel free to give it a try.
Closing this issue.