DnaFeaturesViewer
DnaFeaturesViewer copied to clipboard
Visualize contig genbank file
Hey there, I am really astonished how easy it is with your script to visualize genes, but unfortunately I am working a lot with genome drafts, so unclosed genomes with many contigs in one genbank file. Is there a way to parse more than one and not getting the error ValueError: More than one record found in handle
Cheers
I haven't tested it but you could try with SeqIO.parse()
, like this:
from Bio import SeqIO
from dna_features_viewer import BiopythonTranslator
translator = BiopythonTranslator()
graphic_records = [
translator.translate_record(record)
for record in SeqIO.parse('my_multiple_records.gb', 'genbank')
]
Then to plot them, as usual:
for i, graphic_record in enumerate(graphic_records):
ax, _ = graphic_record.plot()
ax.figure.savefig('record_%03d.png' % i)
If I execute this using my file, I get many files that have the right axis length, but there is one gene over the whole contig that is called source
that's an excerpt of my file:
LOCUS contig 4395 bp DNA linear UNK
DEFINITION name
ACCESSION contig
KEYWORDS .
SOURCE organism
ORGANISM organism
Bacteria.
FEATURES Location/Qualifiers
source 1..4395
/mol_type="genomic DNA"
/db_xref="taxon:"
/genome_md5="something"
/project="something"
/genome_id="something"
/organism="something"
CDS complement(36..689)
/db_xref="SEED:fig|6666666.256796.peg.1026"
/translation="MMMMMMMMMMMMMMGSGKYQSDIVR"
/product="hypothetical protein"
/transl_table=11
BASE COUNT 903 a 1307 c 1323 g 862 t
ORIGIN
1 taaaaatctc gggattttga atcattaaga attaactacc ttactatatc actctgatat
61 ttaccactac cccggatcgc tgccgggtta cccgccgagg tggagtcatg cagactcttg
121 cgcaccacca gggtgccgtc cgcgtcggta aagacgttct cggtggcggc catccccgcc
181 atctccttga tggggttgag gcgatagggg agatagcgca ggaagatccg ttcggcataa
I am not sure I understand. Can you please explain from the beginning what you are trying to achieve ?
I want to plot all the protein coding genes that are in one contig, so as an output I have as many plots as contigs, each plot with the respective genes.
Still not clear sorry - I take it that what you are trying to do would be obvious for people from the same field, but in my case I cant connect the dots in your different messages. Try explaining like I am 5 :stuck_out_tongue:
ahahah ok 👍 I am working a lot with metagenomic data, so I have incomplete draft genome, that means many scaffolds/contigs that represent one genome, but are not connected to a full chromosome. Therefore I try to visualize all contigs individually, each with its CDS. Thanks for your help!
any news on that?