BayesPrism Using known cell type fractions

Dear Tinyi,

Thanks so much for developing (and maintaining) such great software. This is more of a question rather than an issue. We were wondering whether it would be possible to start from known fractions and use BayesPrism to do in sillico gene expression purification of the different cell-types.

Thanks so much in advance.

Best wishes,

Oriol

Jun 19 '24 09:06 oriolpich

Hi Oriol,

Apologize for the delay.

Thank you for your interest in our method. To answer your question, it is generally not recommended to use the known fractions due to the reason that the cell type fraction inferred by BayesPrism represents reads% from each cell type rather than cell count%. I assume that the known fraction you mentioned would represent cell count%, so if you would like to rely on this information to impute gene expression, you may need to first convert it to reads% by multiplying a cell size factor, e.g. inferred from scRNA-seq. After this, you can then compute the expected mean of cell type-specifc gene expression conditional on bulk data and reads% of each cell type using the following function, which is essentially a part of the Gibbs sampling of BayesPrism .

#' function to compute E[Z | X, theta, phi], the expected cell type-specific gene expression Z #' conditional on observed mixture X, cell type fraction theta and scRNA-seq reference phi #' #' @param X, observed mixture, a N-by-G matrix #' @param theta, MAP estimator of cell type fraction, a N-by-K matrix #' @param phi, scRNA-seq reference, a K-by-G matrix

E.Z <- function(X, theta, phi){

N <- nrow(X) G <- ncol(phi) K <- nrow(phi)

Z <- array(NA,c(N, G, K), dimnames=list(rownames(X),colnames(X),rownames(phi)))

X_over_theta_phi <- X / (theta %% phi) #NG

for(n in 1:N) { Z[n,,] <- t(phi * theta[n,]) * X_over_theta_phi[n,] }

return(Z) }

Best,

Tinyi

On Wed, Jun 19, 2024 at 5:57 AM Oriol Pich @.***> wrote:

Dear Tinyi,

Thanks so much for developing (and maintaining) such great software. This is more of a question rather than an issue. We were wondering whether it would be possible to start from known fractions and use BayesPrism to do in sillico gene expression purification of the different cell-types.

Thanks so much in advance.

Best wishes,

Oriol

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/90, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS4IWOT6JPP5GD7RMDTZIFIYPAVCNFSM6AAAAABJRW5BUKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DCOJRHA2TINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jun 26 '24 02:06 tinyi

Fantastic, thanks Tinyi!

Jul 04 '24 06:07 oriolpich

Hi Tinyi, may I ask a few follow-up questions?

Any advise on how to get the cell size factor from single cell data? So far we tried :

sce_sumf <- computeSumFactors(single_cell_counts, clusters=metadata$clusters) and then getting the mean(sizeFactors(sce_sumf[, index_cells_incluster])), but we are unsure whether this is the best way to get them.

I understand we need to multiply our fractions to the cell type specific cell size factor, and this would be our theta. Shall I renormalise to 1 before feeding it into the function?
Would you run the function as it is, or shall I try to add it within BayesPrism (eg including it at some step in the overall BayesPrism run)? Is there any way we could make it more robust?

Thanks a lot!

Oriol

Oct 14 '24 21:10 oriolpich

Hi Oriol,

Thank you for your question. To get the size factor for each cell type, simply compute the mean of the log(total library size).

To convert reads fraction%, which is what BayesPrism outputs, you may simply divide theta by the library size of the corresponding cell type, and then renormalize it to sum-to-one.

Best,

Tinyi

On Mon, Oct 14, 2024 at 5:46 PM Oriol Pich @.***> wrote:

Reopened #90 https://github.com/Danko-Lab/BayesPrism/issues/90.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/90#event-14646656009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS7VS7UT2VL4LM6BBYLZ3Q3UHAVCNFSM6AAAAABJRW5BUKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGY2DMNRVGYYDAOI . You are receiving this because you commented.Message ID: @.***>

Oct 23 '24 03:10 tinyi

This will be integrated into the next update of BayesPrism.

On Tue, Oct 22, 2024 at 11:59 PM Tin Yi Chu @.***> wrote:

Hi Oriol,

Thank you for your question. To get the size factor for each cell type, simply compute the mean of the log(total library size).

To convert reads fraction%, which is what BayesPrism outputs, you may simply divide theta by the library size of the corresponding cell type, and then renormalize it to sum-to-one.

Best,

Tinyi

On Mon, Oct 14, 2024 at 5:46 PM Oriol Pich @.***> wrote:

Reopened #90 https://github.com/Danko-Lab/BayesPrism/issues/90.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/90#event-14646656009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS7VS7UT2VL4LM6BBYLZ3Q3UHAVCNFSM6AAAAABJRW5BUKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGY2DMNRVGYYDAOI . You are receiving this because you commented.Message ID: @.***>

Oct 23 '24 04:10 tinyi

to add to my previous reply. The size factor for each cell type should be computed at the log scale followed by taking an exponentiate, which essentially calculates the geometric mean. The genometric mean is more robust to outliers.

On Tue, Oct 22, 2024 at 11:59 PM Tin Yi Chu @.***> wrote:

Hi Oriol,

Thank you for your question. To get the size factor for each cell type, simply compute the mean of the log(total library size).

To convert reads fraction%, which is what BayesPrism outputs, you may simply divide theta by the library size of the corresponding cell type, and then renormalize it to sum-to-one.

Best,

Tinyi

On Mon, Oct 14, 2024 at 5:46 PM Oriol Pich @.***> wrote:

Reopened #90 https://github.com/Danko-Lab/BayesPrism/issues/90.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/90#event-14646656009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS7VS7UT2VL4LM6BBYLZ3Q3UHAVCNFSM6AAAAABJRW5BUKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGY2DMNRVGYYDAOI . You are receiving this because you commented.Message ID: @.***>

Oct 28 '24 17:10 tinyi

Dear Tinyi,

do you mean that support for starting with known % cells will be incorporated? I have follow-up questions regarding your E.Z function:

Is phi the SC matrix filtered by the cell type I am trying to deconvolve, or the whole SC dataset?
How can I get a measure of confidence similar to what you currently provide for the deconvolution?
Would the function be robust enough to get reliable tumor-specific expression (eg would it be equivalent to what BayesPrism would infer had BayesPrism identified the same fractions I am providing, after the cell size correction we just discussed)?

Thanks so much for your help and for maintaining the software.

Best wishes,

Oriol

Jan 26 '25 09:01 oriolpich