ballgown
ballgown copied to clipboard
Prokaryote Differential Expression: Error loading data into ballgown
Hi,
I am attempting to load stringtie output data into R but get multiple errors in doing so:
data_directory = system.file('extdata2', package='ballgown') bg = ballgown(dataDir=data_directory, samplePattern='sample', meas='all')
I get the following error: Sat Jan 16 13:59:06 2016 Sat Jan 16 13:59:06 2016: Reading linking tables Sat Jan 16 13:59:06 2016: Reading intron data files Sat Jan 16 13:59:06 2016: Merging intron data Error in .local(x, ...) : strand values must be in '+' '-' '*' In addition: Warning messages: 1: In read.table(list.files(samples[1], "i2t.ctab", full.names = TRUE), : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample01/i2t.ctab' 2: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample01/i_data.ctab' 3: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample02/i_data.ctab' 4: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample03/i_data.ctab' 5: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample04/i_data.ctab' 6: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample05/i_data.ctab' 7: In read.table(file, header = TRUE, sep = "\t", colClasses = cc, : incomplete final line found by readTableHeader on '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/ballgown/extdata2/sample06/i_data.ctab'
My directory structure is as so: extdata2/ sample01/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab sample02/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ... sample06/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab
The problem looks to be with the intronic data. I want to do differential expression analysis on a prokaryote, thus when I use stringtie the i_data.ctab and i2t.ctab have no values since there are no intronic regions in prokaryotes.
Is there a workaround in ballgown for doing differential expression analysis on prokayotes? Do I need to fill in throwaway values into i_data.ctab and i2t.ctab?
Thank you,
Diego
Hello, Diego
I am also facing the same issue. I had tried with microbial genome in march - it didnt work out and i used cuffdiff instead. Later on, I have worked with some human data that contained both exon and intron and there were no errors. Now i am again trying with microbial genome, and the following error message comes when i try to load stringtie output data -
Error in row.names<-.data.frame
(*tmp*
, value = value) :
invalid 'row.names' length
I would like to know if you were able to come up with a solution. NB: I am not a programmer so I am not in a position to check the functions of ballgown package and rewrite the code.
I'm looking into this to try to figure it out. Can you give me a few more details on the types of samples you ran through StringTie and what versions of the software you are using? Thanks!
I was using SE reads of a microbial mRNA data, tried with just 2 samples as well as with more than 5 samples. Stringtie from Galaxy public server was used to assemble the mapped reads. The tabular file were then renamed to make them .ctab. As mentioned by Diego above, the intron files were empty (i_data.ctab & i2t.ctab). In contrast, when I ran eukaryotic samples the same way (where in intron data were not empty), I could run Ballgown successfully. (R is updated to the latest version and Ballgown was downloaded from Bioconductor).
I have the same problem as upstairs. I use hisat2 for mapping PE reads to a bacterial genome sequence, do as the paper said (the one published in Nature Protocols). Only difference is I only use bacterial genome sequence (fna file) to create the HISAT2 index, since bacteria don't have RNA splicing. And I use Stringtie to assemble transcripts for each sample, then merge transcripts and estimate transcript abundances and create table counts for Ballgown. The error occurs when I load the data to Ballgown. I guess the Hisat2-Stringtie-Ballgown pipeline is initially designed for the well annotated eukaryotic genome sequences and probably it has not been widely tested with bacterial RNAseq data.
bg <- ballgown(dataDir = "ballgown", samplePattern = "CRR", pData = pheno_data
Thu Nov 3 19:48:57 2016: Reading linking tables
Error in row.names<-.data.frame
(*tmp*
, value = value) :
invalid 'row.names' length
traceback() 6: stop("invalid 'row.names' length") 5:
row.names<-.data.frame
(*tmp*
, value = value) 4:row.names<-
(*tmp*
, value = value) 3:rownames<-
(*tmp*
, value = c(1L, 0L)) 2:rownames<-
(*tmp*
, value = c(1L, 0L)) 1: ballgown(dataDir = "ballgown", samplePattern = "CRR", pData = pheno_data)
did you guys resolve this issue?
I have an idea for a fix for this issue, but at the moment ballgown does not currently support prokaryotic analysis, unfortunately.
I currently work on ballgown only in my spare time, but I'll try to see if I can put a fix out in the next couple of weeks (I should have some time to breathe over the holiday break).
Sorry for the inconvenience, but we're working on it!
I also have the same row.names error and would love to use Ballgown for analysis of my bacterial rnaseq project. I was wondering if it would be reasonable to add a single (non-existent) gene with introns and no mapped reads so that the program can read in the rest of the dataset, or if I could format the empty intron file in such a way so that the rest of the data can be read in? Thanks!
I used ballgown for another project and it worked wonderfully and was easy to use!
Hey all, sorry for the trouble here! There isn't a timeline that I know of for adding support for prokaryotic analysis (ballgown is currently in "maintenance mode" rather than "add new functionality mode," though I think this falls somewhere in between those two things) -- but while we figure out what to do, one hack around this is to add "dummy" intron data (e.g., copy the first couple rows of the example i_data.ctab files provided with the package) so ballgown will read it in. You won't actually use the dummy data in your analysis, and some of the plotting functions might not work correctly since they assume introns, but you'll be able at least have some of the functionality.
thanks for the update & tip
/SB
_______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || [email protected]
On 21 February 2017 at 09:38, Alyssa Frazee [email protected] wrote:
Hey all, sorry for the trouble here! There isn't a timeline that I know of for adding support for prokaryotic analysis (ballgown is currently in "maintenance mode" rather than "add new functionality mode," though I think this falls somewhere in between those two things) -- but while we figure out what to do, one hack around this is to add "dummy" intron data (e.g., copy the first couple rows of the example i_data.ctab files provided with the package) so ballgown will read it in. You won't actually use the dummy data in your analysis, and some of the plotting functions might not work correctly since they assume introns, but you'll be able at least have some of the functionality.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alyssafrazee/ballgown/issues/74#issuecomment-281258162, or mute the thread https://github.com/notifications/unsubscribe-auth/APR8U58FRLvRvDvAyy0tKFarQfV3Jic5ks5reoZugaJpZM4HGU_Z .
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Hi all. Has anyone tried the tip by alyssafrazee ? I am also experiencing errors with my analysis.
Thank you.