BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

error importing mtx file

Open cvalenci opened this issue 1 year ago • 3 comments

Hi, I wanted to use BPCells on my single cell RNA data (>1M cells). The original file is on a h5ad format. So, I output the raw counts as a mtx file using mmwrite (python) and the meta.data as a csv file (panda). I can work with the meta.data with no problem but, I am getting error with the mtx matrix (see below). So, I was wondering if there is something that I am missing or if there is any better way to import the data into BPCells. Thank you

Cristian

Load libraries

library(Seurat) library(Signac) library(plyr) # always first and then dplyr library(dplyr) library(tidyr) library(data.table) library(singlecellmethods) library(Matrix) library(presto) library(DESeq2) library(edgeR) library(MASS) #library(ggplot2) options(stringsAsFactors=FALSE) library('BPCells')

GenexCell matrix

ge_counts <- import_matrix_market(mtx_path = "/domino/Final/raw_genecounts.mtx")

Error: basic_ios::clear: iostream error

─ Session info ───────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.3.2 (2023-10-31) os Ubuntu 20.04.6 LTS system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz Etc/UTC date 2024-12-17 pandoc 2.5 @ /usr/bin/pandoc

cvalenci avatar Dec 17 '24 19:12 cvalenci

Hi Cristian, thanks for your bug report! The easiest way to access h5ad data from BPCells is to directly read it with open_matrix_anndata_hdf5(). You can optionally write that matrix into a native BPCells format for somewhat better performance using e.g., write_matrix_dir(), but you can also just use the h5ad file directly.

I'm actually not quite sure what is causing the error message you are getting with the MatrixMarket import, but I would love to be able to better understand what is happening to see if there's a fixable issue here. If you're on linux, could you follow this guide to find + share the location of your crash? Alternatively, if this is a public dataset could you share where I can download it to try to reproduce the issue on my end? I wasn't able to reproduce this with one of my own demo datasets.

Practically speaking, just directly reading the h5ad file will be your easiest step, but I would appreciate it if you have time to help troubleshoot the MatrixMarket import error

-Ben

bnprks avatar Dec 17 '24 23:12 bnprks

Hi Ben, Thanks for your quick response. As you recommend, working with the h5ad file solve the problem! Sure, I will check the guide that you point out and try to get the crash log.

In the mean time and totally unrelated, I was following the tutorial https://bnprks.github.io/BPCells/articles/pbmc3k.html and I wonder if there is an option in BPCell to select variable genes using 'Variance-stabilizing transformation'

Thank you Cristian

cvalenci avatar Dec 18 '24 17:12 cvalenci

Hi Cristian, I think Seurat already has compatibility with BPCells objects in their FindVariableFeatures() method when using selection.method = "vst". Is that what you're looking for? (i.e. make a Seurat object while passing in a BPCells matrix for the counts, then run FindVariableFeatures() and optionally extract the list of features using the VariableFeatures() function)

We don't have a helper function for variable gene selection built in to BPCells yet, but the Seurat method should work efficiently by utilizing the existing BPCells disk-backed operations. (Let me know if there's a bug on the Seurat side, sometimes bits of BPCells compatibility slip through the cracks but it's often easy to fix)

I'd appreciate it if you do have time to collect a crash log. It just might be some one-off problem like a file getting truncated, but it is always good to investigate these in case it is something BPCells needs to fix

-Ben

bnprks avatar Dec 19 '24 00:12 bnprks