rtracklayer
rtracklayer copied to clipboard
Problem with makeTxDbFromGFF
I am currently performing a RNA-Seq Analysis for Cucumis melo and I am currently trying to load the annotation using makeTxDbFromGFF() from a GFF3 file. When i check the TxDb object created using seqinfo(txdb_object) the only thing that appears in the columns are <NA>. I have read from a post:
https://support.bioconductor.org/p/104198/
that the problem can be due to rtracklayer::import.gff3(). Is there a way to fix this problem:
I am using rtracklayer version 1.42.1; my current version of R is 3.5.1. GFF3file: CM4.0.gff3.txt.gz
code: gff3File = <path_to_gff3file>
txdb = makeTxDbFromGFF(gff3File, format="gff3") Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK seqinfo(txdb) Seqinfo object with 13 sequences from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome chr10 <NA> <NA> <NA> chr11 <NA> <NA> <NA> chr12 <NA> <NA> <NA> chr00 <NA> <NA> <NA> chr01 <NA> <NA> <NA> ... ... ... ... chr05 <NA> <NA> <NA> chr06 <NA> <NA> <NA> chr07 <NA> <NA> <NA> chr08 <NA> <NA> <NA> chr09 <NA> <NA> <NA>
Thank you in advance
What do you expect to happen? If the file does not specify any sequence-level information (lengths, genome) the information won't be in the returned object.
Hey there,
I am facing the same issue. Out of the bunch of genomes I have at hand, I have chosen a circular chloroplast genome ( no chromosomes ) for the sake of simplicity.
Cre.CP.annotations.gff3.gz.zip
Addressing the original question of how to fix it - since the first line "Cp . region 1 205535 . . . ID=Cp;Name=Cp;Is_circular=true; seqlengths=205535;genome=cr.cp;" doesn't seem to resolve it, would you please be so kind to communicate a way to correct it?
I have found a workaround resolving my issue, although not advisable for assemblies exhibiting many chromosomes, scaffolds, or consisting merely of contigs, if one cannot easily get one's hands on their attributes like name, length or circularity.
cp.chrominfo <- data.frame(chrom = c('Cp'), length =c(205535), is_circular=c(TRUE) )
txdb.cp <-makeTxDbFromGFF( chrominfo = cp.chrominfo, ...)
So, I simply loaded the info as a data.frame and passed it to makeTxDbFromGFF
.
Hope it helps someone out there!
Probably better to construct a Seqinfo object and then set that on the result.
seqinfo(txdb.cp) <- Seqinfo("Cp", 205535, TRUE)
We could try to infer the sequence universe from the "region" rows in the GFF, but it would still be a heuristic, and it's always best to explicitly set the full specification of the universe. Note that for common genomes (those present in the UCSC browser), you can just call e.g. Seqinfo(genome="hg38")
to automatically download the information.
Thanks for the two hints.
Well, I am not using UCSC genomes, do not know whether makeTxDbFromUCSC
ran into the same issue, though.
Anyhow, more interestingly, the assignment seqinfo<-
is not working, returning
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqinfo<-’ for signature ‘"TxDb"’
I guess that's not supported, but I think you can just pass the Seqinfo to the chrominfo=
argument, like
txdb.cp <-makeTxDbFromGFF( chrominfo = Seqinfo("Cp", 205535L, TRUE), ...)
Ah ofc ! (facepalm) Thumbs up! Very kind of you.