bigmemory
bigmemory copied to clipboard
reading large file using bigmemory
Hi,
I am trying to read a large file using bigmemory and I am getting errors as this file's first two columns are non-numeric, so I have deleted the second column but the first column I want to make as row names.
Is there any option in bigmemory to make the first column as row names and how I can avoid the below warning message?
>library("bigmemory")
> library("biganalytics")
> data.matrix - read.big.matrix("methylation.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("methylation.txt", header = T, :
non-numeric argument to binary operator
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("methylation.txt", header = T, sep = "\t") :
Many thanks,
I don't think you can have rownames for a big.matrix. You should probably just store these somewhere else.
Hi @privefl,
I need to correlate this matrix data with phenotypic data so that's why I want to make SampleID as row names. Can I assign rownames after reading this matrix using a separate list.
Someone suggested some solutions here but I am not sure what does it mean.
https://stackoverflow.com/questions/12576735/bigmemory-and-rownames-dimnames-of-matrix
Many thanks
Just use match()
to get the row indices that correspond to the external SampleID.
(or the opposite, i.e. reorder the phenotypic data instead)
Hi, Thanks @privefl
We just have 2000 rows so we need these for further analysis. Many thanks,
I have removed first two non-numeric column but still, it shows the same error;
> data.matrix - read.big.matrix("phylo.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("phylo.txt", header = T, sep = "\t") :
non-numeric argument to binary operator
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("phylo.txt", header = T, sep = "\t") :
Because type was not specified, we chose double based on the first line of data.
File look like this now after removing first two character column; it has around 80000 column and 2000 rows
cg02115394 cg12480843
0.974035 0.718462
0.967383 0.765799
0.961012 0.84822
0.960447 0.722946
0.963181 0.939808
0.940292 0.878546
Would be a good idea to read only the first e.g. 5 rows with data.table::fread()
to have an idea of the number and types of columns.
I have removed all non-numeric column and I can able to read 5 rows using fread, but bigmemory don't work here.
mydt10 <- fread("phylo.num.txt", nrows = 5)
> dim(mydt10)
[1] 5 844488
> str(mydt10)
Classes ‘data.table’ and 'data.frame': 5 obs. of 844488 variables:
$ cg14361672 : num 0.974 0.967 0.961 0.96 0.963
$ cg12950382 : num 0.718 0.766 0.848 0.723 0.94
$ cg02115394 : num 0.0337 0.0258 0.025 0.0317 0.0357
$ cg12480843 : num 0.0182 0.0189 0.0137 0.0167 0.0151
$ cg26724186 : num 0.98 0.977 0.982 0.982 0.978
$ cg00617867 : num 0.96 0.979 0.98 0.977 0.977
$ cg13773083 : num 0.313 0.246 0.253 0.234 0.372
$ cg17236668 : num 0.974 0.975 0.975 0.979 0.978
$ cg19607165 : num 0.0866 0.0966 0.0804 0.1162 0.0792
$ cg08770523 : num 0.0243 0.0213 0.0203 0.0194 0.0197
table(sapply(mydt10, typeof))
?
> table(sapply(mydt10, typeof))
double
844488
Hum..
Maybe worth trying bigstatsr::big_read()
(https://privefl.github.io/bigstatsr/articles/read-FBM-from-file.html).
Still, getting errors even with bigreadr?
> data2 <- big_fread2("phylo.num.txt", nb_parts = NULL, .transform = identity,.combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2)
*** caught segfault ***
address 0x7f5e51c63df7, cause 'memory not mapped'
Traceback:
1: data.table::fread(input, ..., data.table = data.table, nThread = nThread)
2: fread2(file, skip = skip, select = cols, ..., showProgress = FALSE)
3: .transform(fread2(file, skip = skip, select = cols, ..., showProgress = FALSE))
4: FUN(X[[i]], ...)
5: lapply(split_cols, function(cols) { part <- .transform(fread2(file, skip = skip, select = cols, ..., showProgress = FALSE)) already_read <<- already_read + length(cols) if (progress) utils::setTxtProgressBar(pb, already_read) part})
6: big_fread2("phylo.num.txt", nb_parts = NULL, .transform = identity, .combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Many thanks,
Is it possible to run a loop to read this file using fread in R?
Many thanks,
@bioinfonext:
> data.matrix - read.big.matrix("phylo.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("phylo.txt", header = T, sep = "\t") :
non-numeric argument to binary operator
Is this meant to be:
data.matrix <- read.big.matrix("phylo.txt", header = TRUE, sep = "\t")
# ^^
It looks like you had a typo, given your original error -- the assignment operator was missing the <
.