rtracklayer icon indicating copy to clipboard operation
rtracklayer copied to clipboard

Problem with makeTxDbFromGFF

Open alfloleo opened this issue 6 years ago • 7 comments

I am currently performing a RNA-Seq Analysis for Cucumis melo and I am currently trying to load the annotation using makeTxDbFromGFF() from a GFF3 file. When i check the TxDb object created using seqinfo(txdb_object) the only thing that appears in the columns are <NA>. I have read from a post:

https://support.bioconductor.org/p/104198/

that the problem can be due to rtracklayer::import.gff3(). Is there a way to fix this problem:

I am using rtracklayer version 1.42.1; my current version of R is 3.5.1. GFF3file: CM4.0.gff3.txt.gz

code: gff3File = <path_to_gff3file>

txdb = makeTxDbFromGFF(gff3File, format="gff3") Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK seqinfo(txdb) Seqinfo object with 13 sequences from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome chr10 <NA> <NA> <NA> chr11 <NA> <NA> <NA> chr12 <NA> <NA> <NA> chr00 <NA> <NA> <NA> chr01 <NA> <NA> <NA> ... ... ... ... chr05 <NA> <NA> <NA> chr06 <NA> <NA> <NA> chr07 <NA> <NA> <NA> chr08 <NA> <NA> <NA> chr09 <NA> <NA> <NA>

Thank you in advance

alfloleo avatar Jan 07 '19 16:01 alfloleo

What do you expect to happen? If the file does not specify any sequence-level information (lengths, genome) the information won't be in the returned object.

lawremi avatar Jan 07 '19 18:01 lawremi

Hey there,

I am facing the same issue. Out of the bunch of genomes I have at hand, I have chosen a circular chloroplast genome ( no chromosomes ) for the sake of simplicity.

Cre.CP.annotations.gff3.gz.zip

Addressing the original question of how to fix it - since the first line "Cp . region 1 205535 . . . ID=Cp;Name=Cp;Is_circular=true; seqlengths=205535;genome=cr.cp;" doesn't seem to resolve it, would you please be so kind to communicate a way to correct it?

FaustFrankenstein avatar Jan 15 '19 23:01 FaustFrankenstein

I have found a workaround resolving my issue, although not advisable for assemblies exhibiting many chromosomes, scaffolds, or consisting merely of contigs, if one cannot easily get one's hands on their attributes like name, length or circularity.

cp.chrominfo <- data.frame(chrom = c('Cp'), length =c(205535), is_circular=c(TRUE) )
txdb.cp <-makeTxDbFromGFF( chrominfo = cp.chrominfo, ...)

So, I simply loaded the info as a data.frame and passed it to makeTxDbFromGFF.

Hope it helps someone out there!

FaustFrankenstein avatar Jan 22 '19 23:01 FaustFrankenstein

Probably better to construct a Seqinfo object and then set that on the result.

seqinfo(txdb.cp) <- Seqinfo("Cp", 205535, TRUE)

We could try to infer the sequence universe from the "region" rows in the GFF, but it would still be a heuristic, and it's always best to explicitly set the full specification of the universe. Note that for common genomes (those present in the UCSC browser), you can just call e.g. Seqinfo(genome="hg38") to automatically download the information.

lawremi avatar Jan 22 '19 23:01 lawremi

Thanks for the two hints. Well, I am not using UCSC genomes, do not know whether makeTxDbFromUCSC ran into the same issue, though. Anyhow, more interestingly, the assignment seqinfo<- is not working, returning

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqinfo<-’ for signature ‘"TxDb"’

FaustFrankenstein avatar Jan 23 '19 00:01 FaustFrankenstein

I guess that's not supported, but I think you can just pass the Seqinfo to the chrominfo= argument, like

txdb.cp <-makeTxDbFromGFF( chrominfo = Seqinfo("Cp", 205535L, TRUE), ...)

lawremi avatar Jan 23 '19 00:01 lawremi

Ah ofc ! (facepalm) Thumbs up! Very kind of you.

FaustFrankenstein avatar Jan 23 '19 00:01 FaustFrankenstein