ballgown icon indicating copy to clipboard operation
ballgown copied to clipboard

Problems with Genenames annotation

Open PhrenoVermouth opened this issue 7 years ago • 7 comments

I followed the official tutorial on Nature Protocols, but many genenames were substituted by dots. So I could not call the ID of bg_chrX[12]. image

Has anyone met this before?

Thanks!

PhrenoVermouth avatar Mar 13 '17 02:03 PhrenoVermouth

Hi

We are seeing this too. This appears to be because StringTie only assigns a gene name to a transcript if that exact transcript sequence appears in the GTF file you fed to StringTie. We are working on a fix that uses sequence overlap to label newly assembled/different transcripts to gene names in Ballgown.

Best

Jeff

On Sun, Mar 12, 2017 at 10:30 PM PhrenoVermouth [email protected] wrote:

I followed the official tutorial on Nature Protocols, but many genenames were substituted by dots. So I could not call the ID of bg_chrX[12]. [image: image] https://cloud.githubusercontent.com/assets/14256941/23839269/cc2f5c3a-07d7-11e7-9fbc-753f497f2f18.png

Has anyone met this before?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alyssafrazee/ballgown/issues/98, or mute the thread https://github.com/notifications/unsubscribe-auth/ABf7WowsNNXY0c-qlFEwP63exf2v-bFKks5rlKpLgaJpZM4MayB0 .

jtleek avatar Mar 17 '17 15:03 jtleek

Thanks for your response!

Let's make annotation great again!

LOL

Best, Guang

PhrenoVermouth avatar Mar 19 '17 03:03 PhrenoVermouth

Hi Jeff,

How fast can you come up with this fix? I have come all the way to the last step and realised that many significant genes do not have a gene name assigned to it.

Best Regards, Lu

eudoraleer avatar Mar 20 '17 09:03 eudoraleer

HTSeq recently just released its new version 0.7.1 and fixed some bugs. You might try HTseq---DESeq2 for downstream analysis after alignment with hisat2.

Yours, Guang

PhrenoVermouth avatar Mar 20 '17 11:03 PhrenoVermouth

Hi, How's the progress going on as I have met the same problem?

The snapshot of results_gene 3

Here's my code: >results_transcripts = data.frame(geneNames=ballgown::geneNames(bg),geneIDs=ballgown::geneIDs(bg), results_transcripts) > results_transcripts = stattest(bg,feature="transcript",covariate="transgene",adjustvars =c("tissue"), getFC=TRUE, meas="FPKM") > results_gene = stattest(bg,feature="gene",covariate="transgene",adjustvars =c("tissue"), getFC=TRUE, meas="FPKM") > results_transcripts =data.frame(geneNames=ballgown::geneNames(bg),geneIDs=ballgown::geneIDs(bg), results_transcripts)

Looking forward to your reply! Thanks, Nico

nicoggsmd avatar Nov 13 '17 08:11 nicoggsmd

@nicoggsmd Honestly speaking, now I choose HISAT2---featureCounts---DESeq2 for daily use. These softwares just like iPhones... subtle differences between 7th gen and 8th.

Have fun!

Guang

PhrenoVermouth avatar Nov 13 '17 13:11 PhrenoVermouth

Hi, guys @PhrenoVermouth Thanks for your reply! @jtleek So, it means there's still no good strategy to solve this problem in ballgown? Why didn't I see people comment on this issue on the Internet if this is a common question?

Have a nice day! Nico

nicoggsmd avatar Nov 13 '17 14:11 nicoggsmd