bigmemory copied to clipboard
reading large file using bigmemory
I am trying to read a large file using bigmemory and I am getting errors as this file's first two columns are non-numeric, so I have deleted the second column but the first column I want to make as row names.
Is there any option in bigmemory to make the first column as row names and how I can avoid the below warning message?
> library("biganalytics")
> data.matrix - read.big.matrix("methylation.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("methylation.txt", header = T, :
non-numeric argument to binary operator
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("methylation.txt", header = T, sep = "\t") :
Many thanks,
I don't think you can have rownames for a big.matrix. You should probably just store these somewhere else.
Hi @privefl,
I need to correlate this matrix data with phenotypic data so that's why I want to make SampleID as row names. Can I assign rownames after reading this matrix using a separate list.
Someone suggested some solutions here but I am not sure what does it mean.
Many thanks
Just use match()
to get the row indices that correspond to the external SampleID.
(or the opposite, i.e. reorder the phenotypic data instead)
Hi, Thanks @privefl
We just have 2000 rows so we need these for further analysis. Many thanks,
I have removed first two non-numeric column but still, it shows the same error;
> data.matrix - read.big.matrix("phylo.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("phylo.txt", header = T, sep = "\t") :
non-numeric argument to binary operator
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("phylo.txt", header = T, sep = "\t") :
Because type was not specified, we chose double based on the first line of data.
File look like this now after removing first two character column; it has around 80000 column and 2000 rows
cg02115394 cg12480843
0.974035 0.718462
0.967383 0.765799
0.961012 0.84822
0.960447 0.722946
0.963181 0.939808
0.940292 0.878546
Would be a good idea to read only the first e.g. 5 rows with data.table::fread()
to have an idea of the number and types of columns.
I have removed all non-numeric column and I can able to read 5 rows using fread, but bigmemory don't work here.
mydt10 <- fread("phylo.num.txt", nrows = 5)
> dim(mydt10)
[1] 5 844488
> str(mydt10)
Classes ‘data.table’ and 'data.frame': 5 obs. of 844488 variables:
$ cg14361672 : num 0.974 0.967 0.961 0.96 0.963
$ cg12950382 : num 0.718 0.766 0.848 0.723 0.94
$ cg02115394 : num 0.0337 0.0258 0.025 0.0317 0.0357
$ cg12480843 : num 0.0182 0.0189 0.0137 0.0167 0.0151
$ cg26724186 : num 0.98 0.977 0.982 0.982 0.978
$ cg00617867 : num 0.96 0.979 0.98 0.977 0.977
$ cg13773083 : num 0.313 0.246 0.253 0.234 0.372
$ cg17236668 : num 0.974 0.975 0.975 0.979 0.978
$ cg19607165 : num 0.0866 0.0966 0.0804 0.1162 0.0792
$ cg08770523 : num 0.0243 0.0213 0.0203 0.0194 0.0197
table(sapply(mydt10, typeof))
> table(sapply(mydt10, typeof))
Maybe worth trying bigstatsr::big_read()
Still, getting errors even with bigreadr?
> data2 <- big_fread2("phylo.num.txt", nb_parts = NULL, .transform = identity,.combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2)
*** caught segfault ***
address 0x7f5e51c63df7, cause 'memory not mapped'
1: data.table::fread(input, ..., data.table = data.table, nThread = nThread)
2: fread2(file, skip = skip, select = cols, ..., showProgress = FALSE)
3: .transform(fread2(file, skip = skip, select = cols, ..., showProgress = FALSE))
4: FUN(X[[i]], ...)
5: lapply(split_cols, function(cols) { part <- .transform(fread2(file, skip = skip, select = cols, ..., showProgress = FALSE)) already_read <<- already_read + length(cols) if (progress) utils::setTxtProgressBar(pb, already_read) part})
6: big_fread2("phylo.num.txt", nb_parts = NULL, .transform = identity, .combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Many thanks,
Is it possible to run a loop to read this file using fread in R?
Many thanks,
> data.matrix - read.big.matrix("phylo.txt",header=T,sep='\t')
Error in data.matrix - read.big.matrix("phylo.txt", header = T, sep = "\t") :
non-numeric argument to binary operator
Is this meant to be:
data.matrix <- read.big.matrix("phylo.txt", header = TRUE, sep = "\t")
# ^^
It looks like you had a typo, given your original error -- the assignment operator was missing the <