ballgown icon indicating copy to clipboard operation
ballgown copied to clipboard

Prokaryote Differential Expression: Error loading data into ballgown

Open dgelsin opened this issue 8 years ago • 10 comments

Hi,

I am attempting to load stringtie output data into R but get multiple errors in doing so:

data_directory = system.file('extdata2', package='ballgown') bg = ballgown(dataDir=data_directory, samplePattern='sample', meas='all')

I get the following error: Sat Jan 16 13:59:06 2016 Sat Jan 16 13:59:06 2016: Reading linking tables Sat Jan 16 13:59:06 2016: Reading intron data files Sat Jan 16 13:59:06 2016: Merging intron data Error in .local(x, ...) : strand values must be in '+' '-' '*' In addition: Warning messages: 1: In read.table(list.files(samples[1], "i2t.ctab", full.names = TRUE), : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample01/i2t.ctab' 2: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample01/i_data.ctab' 3: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample02/i_data.ctab' 4: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample03/i_data.ctab' 5: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample04/i_data.ctab' 6: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample05/i_data.ctab' 7: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample06/i_data.ctab'

My directory structure is as so: extdata2/ sample01/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab sample02/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ... sample06/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab

The problem looks to be with the intronic data. I want to do differential expression analysis on a prokaryote, thus when I use stringtie the i_data.ctab and i2t.ctab have no values since there are no intronic regions in prokaryotes.

Is there a workaround in ballgown for doing differential expression analysis on prokayotes? Do I need to fill in throwaway values into i_data.ctab and i2t.ctab?

Thank you,

Diego

dgelsin avatar Jan 16 '16 19:01 dgelsin

Hello, Diego

I am also facing the same issue. I had tried with microbial genome in march - it didnt work out and i used cuffdiff instead. Later on, I have worked with some human data that contained both exon and intron and there were no errors. Now i am again trying with microbial genome, and the following error message comes when i try to load stringtie output data -

Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length

I would like to know if you were able to come up with a solution. NB: I am not a programmer so I am not in a position to check the functions of ballgown package and rewrite the code.

devikaatgit avatar Sep 29 '16 09:09 devikaatgit

I'm looking into this to try to figure it out. Can you give me a few more details on the types of samples you ran through StringTie and what versions of the software you are using? Thanks!

jtleek avatar Sep 29 '16 15:09 jtleek

I was using SE reads of a microbial mRNA data, tried with just 2 samples as well as with more than 5 samples. Stringtie from Galaxy public server was used to assemble the mapped reads. The tabular file were then renamed to make them .ctab. As mentioned by Diego above, the intron files were empty (i_data.ctab & i2t.ctab). In contrast, when I ran eukaryotic samples the same way (where in intron data were not empty), I could run Ballgown successfully. (R is updated to the latest version and Ballgown was downloaded from Bioconductor).

devikaatgit avatar Sep 30 '16 06:09 devikaatgit

I have the same problem as upstairs. I use hisat2 for mapping PE reads to a bacterial genome sequence, do as the paper said (the one published in Nature Protocols). Only difference is I only use bacterial genome sequence (fna file) to create the HISAT2 index, since bacteria don't have RNA splicing. And I use Stringtie to assemble transcripts for each sample, then merge transcripts and estimate transcript abundances and create table counts for Ballgown. The error occurs when I load the data to Ballgown. I guess the Hisat2-Stringtie-Ballgown pipeline is initially designed for the well annotated eukaryotic genome sequences and probably it has not been widely tested with bacterial RNAseq data.

bg <- ballgown(dataDir = "ballgown", samplePattern = "CRR", pData = pheno_data Thu Nov 3 19:48:57 2016: Reading linking tables Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length

traceback() 6: stop("invalid 'row.names' length") 5: row.names<-.data.frame(*tmp*, value = value) 4: row.names<-(*tmp*, value = value) 3: rownames<-(*tmp*, value = c(1L, 0L)) 2: rownames<-(*tmp*, value = c(1L, 0L)) 1: ballgown(dataDir = "ballgown", samplePattern = "CRR", pData = pheno_data)

jinhuiwang avatar Nov 03 '16 18:11 jinhuiwang

did you guys resolve this issue?

mjfi2sb3 avatar Nov 25 '16 15:11 mjfi2sb3

I have an idea for a fix for this issue, but at the moment ballgown does not currently support prokaryotic analysis, unfortunately.

I currently work on ballgown only in my spare time, but I'll try to see if I can put a fix out in the next couple of weeks (I should have some time to breathe over the holiday break).

Sorry for the inconvenience, but we're working on it!

alyssafrazee avatar Dec 13 '16 00:12 alyssafrazee

I also have the same row.names error and would love to use Ballgown for analysis of my bacterial rnaseq project. I was wondering if it would be reasonable to add a single (non-existent) gene with introns and no mapped reads so that the program can read in the rest of the dataset, or if I could format the empty intron file in such a way so that the rest of the data can be read in? Thanks!

I used ballgown for another project and it worked wonderfully and was easy to use!

alexweisberg avatar Jan 24 '17 19:01 alexweisberg

Hey all, sorry for the trouble here! There isn't a timeline that I know of for adding support for prokaryotic analysis (ballgown is currently in "maintenance mode" rather than "add new functionality mode," though I think this falls somewhere in between those two things) -- but while we figure out what to do, one hack around this is to add "dummy" intron data (e.g., copy the first couple rows of the example i_data.ctab files provided with the package) so ballgown will read it in. You won't actually use the dummy data in your analysis, and some of the plotting functions might not work correctly since they assume introns, but you'll be able at least have some of the functionality.

alyssafrazee avatar Feb 21 '17 06:02 alyssafrazee

thanks for the update & tip

/SB

_______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || [email protected]

On 21 February 2017 at 09:38, Alyssa Frazee [email protected] wrote:

Hey all, sorry for the trouble here! There isn't a timeline that I know of for adding support for prokaryotic analysis (ballgown is currently in "maintenance mode" rather than "add new functionality mode," though I think this falls somewhere in between those two things) -- but while we figure out what to do, one hack around this is to add "dummy" intron data (e.g., copy the first couple rows of the example i_data.ctab files provided with the package) so ballgown will read it in. You won't actually use the dummy data in your analysis, and some of the plotting functions might not work correctly since they assume introns, but you'll be able at least have some of the functionality.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alyssafrazee/ballgown/issues/74#issuecomment-281258162, or mute the thread https://github.com/notifications/unsubscribe-auth/APR8U58FRLvRvDvAyy0tKFarQfV3Jic5ks5reoZugaJpZM4HGU_Z .

--


This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

mjfi2sb3 avatar Feb 21 '17 08:02 mjfi2sb3

Hi all. Has anyone tried the tip by alyssafrazee ? I am also experiencing errors with my analysis.

Thank you.

KeaNcu avatar Oct 04 '17 16:10 KeaNcu