DnaFeaturesViewer icon indicating copy to clipboard operation
DnaFeaturesViewer copied to clipboard

Visualize contig genbank file

Open sarah872 opened this issue 7 years ago • 7 comments

Hey there, I am really astonished how easy it is with your script to visualize genes, but unfortunately I am working a lot with genome drafts, so unclosed genomes with many contigs in one genbank file. Is there a way to parse more than one and not getting the error ValueError: More than one record found in handle Cheers

sarah872 avatar May 30 '17 19:05 sarah872

I haven't tested it but you could try with SeqIO.parse(), like this:

from Bio import SeqIO
from dna_features_viewer import BiopythonTranslator
translator = BiopythonTranslator()
graphic_records = [
    translator.translate_record(record)
    for record in SeqIO.parse('my_multiple_records.gb', 'genbank')
]

Then to plot them, as usual:

for i, graphic_record in enumerate(graphic_records):
    ax, _ = graphic_record.plot()
    ax.figure.savefig('record_%03d.png' % i)

Zulko avatar May 31 '17 07:05 Zulko

If I execute this using my file, I get many files that have the right axis length, but there is one gene over the whole contig that is called source that's an excerpt of my file:

LOCUS       contig         4395 bp    DNA     linear   UNK 
DEFINITION  name
ACCESSION   contig
KEYWORDS    .
SOURCE      organism
  ORGANISM  organism
            Bacteria.
FEATURES             Location/Qualifiers
     source          1..4395
                     /mol_type="genomic DNA"
                     /db_xref="taxon:"
                     /genome_md5="something"
                     /project="something"
                     /genome_id="something"
                     /organism="something"
     CDS             complement(36..689)
                     /db_xref="SEED:fig|6666666.256796.peg.1026"
                     /translation="MMMMMMMMMMMMMMGSGKYQSDIVR"
                     /product="hypothetical protein"
                     /transl_table=11
BASE COUNT      903 a   1307 c   1323 g    862 t
ORIGIN      
        1 taaaaatctc gggattttga atcattaaga attaactacc ttactatatc actctgatat
       61 ttaccactac cccggatcgc tgccgggtta cccgccgagg tggagtcatg cagactcttg
      121 cgcaccacca gggtgccgtc cgcgtcggta aagacgttct cggtggcggc catccccgcc
      181 atctccttga tggggttgag gcgatagggg agatagcgca ggaagatccg ttcggcataa

sarah872 avatar May 31 '17 19:05 sarah872

I am not sure I understand. Can you please explain from the beginning what you are trying to achieve ?

Zulko avatar May 31 '17 19:05 Zulko

I want to plot all the protein coding genes that are in one contig, so as an output I have as many plots as contigs, each plot with the respective genes.

sarah872 avatar May 31 '17 21:05 sarah872

Still not clear sorry - I take it that what you are trying to do would be obvious for people from the same field, but in my case I cant connect the dots in your different messages. Try explaining like I am 5 :stuck_out_tongue:

Zulko avatar Jun 01 '17 13:06 Zulko

ahahah ok 👍 I am working a lot with metagenomic data, so I have incomplete draft genome, that means many scaffolds/contigs that represent one genome, but are not connected to a full chromosome. Therefore I try to visualize all contigs individually, each with its CDS. Thanks for your help!

sarah872 avatar Jun 02 '17 14:06 sarah872

any news on that?

sarah872 avatar Jul 21 '17 13:07 sarah872