SPARK
SPARK copied to clipboard
Error initializing model0. log(lib_size) produces inf values
Occurred while analyzing some 10x genomics visium data.
The code block below will produce the following message repeatedly and then error out
# fitting count-based spatial model under the null hypothesis ... Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') ... Error in { : task 1 failed - "variable lengths differ (found for 'offset(log(lib_size))')" Calls: spark.vc -> %dopar% -> <Anonymous> Execution halted
library(SPARK)
library(Seurat)
library(readr)
# read in count data
count = Read10X_h5('HBC_FFPE_DCIS/HBC_JC_sampleA/raw_feature_bc_matrix.h5')
dim(count) # [1] 36601 4881
# read in and format location information
loc = read.csv('HBC_JC_sampleA/tissue_positions.csv')
loc = loc[match(colnames(count), loc$barcode),] # match cell IDs
rownames(loc) = loc$barcode # set cell IDs to rownames
loc= loc[, c('array_row','array_col')] # isolate coordinates
colnames(loc) = c('x','y') # change coordinate col names
num_cores = 18 # using 18 out of 36 cores
# filter out mitochondrial genes
mt_idx = grep("MT-",rownames(count))
if(length(mt_idx)!=0){
count = count[-mt_idx,]
}
# fit and test SPARK model
spark_p = CreateSPARKObject(counts=count, location=loc, percentage = 0, min_total_counts = 10)
spark_p@lib_size = apply(count, 2, sum)
spark_p = spark.vc(spark_p,
covariates = NULL,
lib_size = spark_p@lib_size,
num_core = num_cores,
verbose = F,
fit.model = "poisson")
spark_p = spark.test(spark_p,
check_positive = T,
verbose = F)
Upon attempting to initialize model0 as in the source code there is an error:
lib_size = as.numeric(apply(count, 2, sum))
model0 = try(glm(formula = as.numeric(count[1,]) ~ 1 + offset(log(lib_size)), family = poisson(link="log")))
Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : NA/NaN/Inf in 'y'
Upon further inspection, it turns out log(lib_size) is producing infinite values:
any(is.infinite(log(lib_size)))
[1] TRUE
Would it be appropriate to implement an imputation for the infinite values produced by the log function?