BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

matrix with no cells

Open brgew opened this issue 3 months ago • 1 comments

Hi Ben,

I think that this is a marginal problem.

Anyway, we can have matrices in which all cells are filtered out; that is, all cells have less than 100 umis, for example.

I use BPCells import_matrix_market() to read the matrix market file and subsequently calculate the column sums and drop columns with fewer than 100 'counts'. Later I use write_matrix_dir() to store the matrix on disk. When I restore the matrix in R using open_matrix_dir() , R reports that the second dimension is 1, where the dim name is set to NULL.

library(BPCells)
mat <- import_matrix_market('Keyhole.umi_counts.mtx')

> str(mat)
Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  ..@ dir        : chr "/tmp/xxx/RtmpqwNpT9/matrix_market23d73452b52716"
  ..@ compressed : logi TRUE
  ..@ buffer_size: int 8192
  ..@ type       : chr "uint32_t"
  ..@ dim        : int [1:2] 70038 1178
  ..@ transpose  : logi FALSE
  ..@ dimnames   :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> csums <- colSums(mat)
> mat2 <- mat[,csums>100]

> str(mat2)
Formal class 'MatrixSubset' [package "BPCells"] with 7 slots
  ..@ matrix       :Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  .. .. ..@ dir        : chr "/tmp/xxx/RtmpqwNpT9/matrix_market23d73452b52716"
  .. .. ..@ compressed : logi TRUE
  .. .. ..@ buffer_size: int 8192
  .. .. ..@ type       : chr "uint32_t"
  .. .. ..@ dim        : int [1:2] 70038 1178
  .. .. ..@ transpose  : logi FALSE
  .. .. ..@ dimnames   :List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  ..@ row_selection: int(0) 
  ..@ col_selection: int(0) 
  ..@ zero_dims    : logi [1:2] FALSE TRUE
  ..@ dim          : int [1:2] 70038 0
  ..@ transpose    : logi FALSE
  ..@ dimnames     :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> write_matrix_dir(mat2, 'foo_dir',overwrite=TRUE)
70038 x 1 IterableMatrix object with class MatrixDir

Row names: unknown names
Col names: unknown names

Data type: uint32_t
Storage order: column major

> mat3 <- open_matrix_dir('foo_dir')
> str(mat3)
Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  ..@ dir        : chr "/net/xxx/79/5dc9"| __truncated__
  ..@ compressed : logi TRUE
  ..@ buffer_size: int 8192
  ..@ type       : chr "uint32_t"
  ..@ dim        : int [1:2] 70038 1
  ..@ transpose  : logi FALSE
  ..@ dimnames   :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> dim(mat3)
[1] 70038     1

The value of 1 creates a problem with a Bioconductor package, which expects a value of 0.

I will try working around this by checking for this condition after the open_matrix_dir() call.

I appreciate your consideration and thoughts.

Ever grateful, Brent

brgew avatar Sep 18 '25 19:09 brgew

I'll look into this! Busy until next Tuesday but I'll update you around then. Thanks Brent @brgew

immanuelazn avatar Sep 23 '25 19:09 immanuelazn