pyGenomeViz
pyGenomeViz copied to clipboard
Cannot visualise rRNA units from gff file
Hello,
I am trying to visualise my rRNA units with the following script:
# Limited Range Features
from Bio.SeqFeature import SeqFeature
from pygenomeviz import Gff, GenomeViz
import matplotlib.pyplot as plt
# Load GFF file
gff_file = "Gb_microsporidia_assemblyGFF.gff"
gff = Gff(gff_file, min_range=1, max_range=170000)
# Plot min-max range GFF features with sublabel
gv = GenomeViz(fig_track_height=0.7, tick_style="bar")
track = gv.add_feature_track(name=gff.name, size=gff.range_size, start_pos=gff.min_range, labelsize=0)
track.add_gff_features(gff, feature_type="CDS", facecolor="skyblue", linewidth=0.5, label_type="gene", labelvpos="top")
track.add_gff_features(gff, feature_type="rRNA", facecolor="lime", linewidth=0.5, label_type="product", labelvpos="top", patch_kws=dict(hatch="/"))
track.set_sublabel(f"{gff.name}\n\n{gff.min_range} - {gff.max_range} bp", ymargin=1.0)
fig = gv.plotfig()
# Save the figure
plt.savefig("genome_range_plot.png", dpi=1000)
but it was not working as I think my gff format is no quite the one expected. Here are the location of the rRNA units in the gff file:
CONTIG_177_POLISHED EMBL/GenBank/SwissProt rRNA 162154 166014 . - . note "rRNA unit"
CONTIG_181_POLISHED EMBL/GenBank/SwissProt rRNA 4102 7964 . - . note "rRNA unit"
CONTIG_293_POLISHED EMBL/GenBank/SwissProt rRNA 1 734 . - . note "rRNA unit"
Any idea on how I could plot them with the track.add_gff_features or feature.qualifiers.get functions.
Thank you for your help!
Hi @Edouard94,
I'm not sure what visualization results you expect to get. Were you expecting the three rRNAs listed in the example GFF file to be plotted on one track ranging from 0 - 170,000bp?
If the above assumption is correct, then the idea of plotting rRNAs present in different contigs on one track is wrong, so it is natural that the plot will not be as you expected.
Hello @moshi4,
Thank you for your reply, yes I can understand that having multiple rRNAs units is wrong; I wanted to plot this surprising annotation and try to visualise the placement of these units along my genome.
But I also retry to plot only one rRNA unit (located between 1 and 1000), and again the expected 'lime' unit is not produced, is this because of my gff format file? (see my first comment)
I used the same code as before, and just changed the range from 1 to 1000 bp.
Sorry, I don't quite understand what you are trying to do, so it is difficult to answer. At least the GFF file is reading correctly, so I think the format is correct. It seems to me that you are not understanding the previous comment correctly, so please take a look at the following code example to try to understand what I mean.
Code Example
from pygenomeviz import Gff, GenomeViz
gff_file = "GCF_000959065.1.gff.gz"
gff = Gff(gff_file)
gv = GenomeViz()
for seqid, size in gff.get_seqid2size().items():
track = gv.add_feature_track(seqid, size)
track.set_sublabel()
cds_features = gff.get_seqid2features("CDS")[seqid]
track.add_features(cds_features, facecolor="skyblue", plotstyle="arrow")
rrna_features = gff.get_seqid2features("rRNA")[seqid]
track.add_features(rrna_features, facecolor="lime", plotstyle="arrow")
gv.savefig("example01.png")
example01.png
from pygenomeviz import Gff, GenomeViz
gff_file = "GCF_000959065.1.gff.gz"
target_seqid = "NZ_LAEX01000007.1"
gff = Gff(gff_file, target_seqid=target_seqid, min_range=50000, max_range=100000)
gv = GenomeViz()
track = gv.add_feature_track(target_seqid, gff.range_size, start_pos=gff.min_range)
track.set_sublabel()
cds_features = gff.extract_features("CDS")
track.add_features(cds_features, facecolor="skyblue", plotstyle="arrow")
rrna_features = gff.extract_features("rRNA")
track.add_features(rrna_features, facecolor="lime", plotstyle="arrow")
gv.savefig("example02.png")
example02.png