SPARK icon indicating copy to clipboard operation
SPARK copied to clipboard

Error initializing model0. log(lib_size) produces inf values

Open davidaguilaratx opened this issue 1 year ago • 0 comments

Occurred while analyzing some 10x genomics visium data.

The code block below will produce the following message repeatedly and then error out

# fitting count-based spatial model under the null hypothesis ... Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') Error in model.frame.default(formula = as.numeric(object@counts[ig, ]) ~ : variable lengths differ (found for 'offset(log(lib_size))') ... Error in { : task 1 failed - "variable lengths differ (found for 'offset(log(lib_size))')" Calls: spark.vc -> %dopar% -> <Anonymous> Execution halted

library(SPARK)
library(Seurat)
library(readr)

# read in count data
count = Read10X_h5('HBC_FFPE_DCIS/HBC_JC_sampleA/raw_feature_bc_matrix.h5')

dim(count) # [1] 36601  4881

# read in and format location information
loc = read.csv('HBC_JC_sampleA/tissue_positions.csv')
loc = loc[match(colnames(count), loc$barcode),] # match cell IDs
rownames(loc) = loc$barcode # set cell IDs to rownames
loc= loc[, c('array_row','array_col')] # isolate coordinates
colnames(loc) = c('x','y') # change coordinate col names

num_cores = 18 # using 18 out of 36 cores

# filter out mitochondrial genes
  mt_idx = grep("MT-",rownames(count))
  if(length(mt_idx)!=0){
    count = count[-mt_idx,]
  }

# fit and test SPARK model
  spark_p = CreateSPARKObject(counts=count, location=loc, percentage = 0, min_total_counts = 10)
  spark_p@lib_size = apply(count, 2, sum)
  spark_p = spark.vc(spark_p, 
                     covariates = NULL,
                     lib_size = spark_p@lib_size, 
                     num_core = num_cores,
                     verbose = F,
                     fit.model = "poisson")
  spark_p = spark.test(spark_p, 
                       check_positive = T,
                       verbose = F)

Upon attempting to initialize model0 as in the source code there is an error:

lib_size = as.numeric(apply(count, 2, sum))
model0 = try(glm(formula = as.numeric(count[1,]) ~ 1 + offset(log(lib_size)), family = poisson(link="log")))

Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : NA/NaN/Inf in 'y'

Upon further inspection, it turns out log(lib_size) is producing infinite values:

any(is.infinite(log(lib_size)))

[1] TRUE

Would it be appropriate to implement an imputation for the infinite values produced by the log function?

davidaguilaratx avatar Sep 02 '23 23:09 davidaguilaratx