BayesPrism icon indicating copy to clipboard operation
BayesPrism copied to clipboard

Minimum sample size requirement

Open Sophon-0 opened this issue 2 years ago • 9 comments

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

Sophon-0 avatar Mar 13 '23 18:03 Sophon-0

Thank you for your interest in our work.

In theory, users can use as few as only one bulk sample. Results from the first (initial) Gibbs sampling will remain the same regardless of the number of bulk RNA-seq, while the accuracy of final (updated) Gibbs sampling will increase as the number of samples increases and may differ only slightly for different number of bulk samples. For bulk samples fewer than 10, we recommend the use of first Gibbs sampling (you may also inspect the results from the updated Gibbs sampling to see the difference).

Best,

Tinyi

On Mon, Mar 13, 2023 at 2:19 PM Mi YANG @.***> wrote:

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS2P4AONEKR3CAMJKH3W35QJDANCNFSM6AAAAAAVZN4RRI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tinyi avatar Mar 14 '23 07:03 tinyi

Thank you so much for your quick reply ! In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ? Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

On Tue, Mar 14, 2023 at 3:01 AM Tinyi Chu @.***> wrote:

Thank you for your interest in our work.

In theory, users can use as few as only one bulk sample. Results from the first (initial) Gibbs sampling will remain the same regardless of the number of bulk RNA-seq, while the accuracy of final (updated) Gibbs sampling will increase as the number of samples increases and may differ only slightly for different number of bulk samples. For bulk samples fewer than 10, we recommend the use of first Gibbs sampling (you may also inspect the results from the updated Gibbs sampling to see the difference).

Best,

Tinyi

On Mon, Mar 13, 2023 at 2:19 PM Mi YANG @.***> wrote:

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB4NHS2P4AONEKR3CAMJKH3W35QJDANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1467482481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWVFLTXGWKS7FUJL2OO3BDW4AJTNANCNFSM6AAAAAAVZN4RRI . You are receiving this because you authored the thread.Message ID: @.***>

Sophon-0 avatar Mar 21 '23 15:03 Sophon-0

Hi Mi,

To answer your question:

In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ?

It is not needed. I don't think random seeds have any significant effect on the result. BayesPrism already takes the posterior mean of MCMC samples.

Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

I am not sure how exactly you would like to set the null distribution, and how that null distribution would represent the null hypothesis you would like to test. Also, p value is a frequentist concept. In Bayesian analysis, you can derive credible intervals, which would be related the the coefficient of variation (CV) provided by in the output of bayesprism.

Best,

Tinyi

On Tue, Mar 21, 2023 at 11:20 AM Mi YANG @.***> wrote:

Thank you so much for your quick reply ! In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ? Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

On Tue, Mar 14, 2023 at 3:01 AM Tinyi Chu @.***> wrote:

Thank you for your interest in our work.

In theory, users can use as few as only one bulk sample. Results from the first (initial) Gibbs sampling will remain the same regardless of the number of bulk RNA-seq, while the accuracy of final (updated) Gibbs sampling will increase as the number of samples increases and may differ only slightly for different number of bulk samples. For bulk samples fewer than 10, we recommend the use of first Gibbs sampling (you may also inspect the results from the updated Gibbs sampling to see the difference).

Best,

Tinyi

On Mon, Mar 13, 2023 at 2:19 PM Mi YANG @.***> wrote:

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AB4NHS2P4AONEKR3CAMJKH3W35QJDANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1467482481 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACWVFLTXGWKS7FUJL2OO3BDW4AJTNANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1478028880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYM6POZ6JU27R7B5FTW5HBKHANCNFSM6AAAAAAVZN4RRI . You are receiving this because you commented.Message ID: @.***>

tinyi avatar Mar 23 '23 22:03 tinyi

Hi Tinyi, I am curious about the calculation process of CV for the estimated theta (cell fraction of the mixture). Were them caculated based on a set of etimated theta values from multiple Gibbs sampling? If so, would it be possible to get the intermidate output of these set of estimated theta values, which might be able to obtain emperical p-values for each estimated initial or final theta.

best, Peng

SoManyPepople avatar Apr 08 '23 05:04 SoManyPepople

Hi Mi,

As mentioned in the previous thread. P value is a frequentist concept, and the proposed strategy does not correspond to an interpretable null distribution.

cv is used to quantify the uncertainty associated with the posterior distribution, in which the variance is estimated from mcmc samples.

Hope it helps.

Best,

Tinyi

On Sat, Apr 8, 2023 at 1:23 AM BlueBird @.***> wrote:

Hi Tinyi, I am curious about the estimation process of CV for the estimated theta (cell fraction of the mixture). Were them caculated based on a set of etimated theta values from multiple Gibbs sampling? If so, would it be possible to get the intermidate output of these set of estimated theta values, which might be able to obtain emperical p-values for each estimated initial or final theta.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1500795728, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS4TEWI3YYB33ZMLEF3XADY3XANCNFSM6AAAAAAVZN4RRI . You are receiving this because you commented.Message ID: @.***>

tinyi avatar Apr 08 '23 19:04 tinyi

Hi Tinyi,

Thanks for your kind and qick reply. In fact, I don't care too much about the estimation of p-value for predicted cell fraction. I want to use a set of estimated theta values for other downstream analysis, and would it be possible to pull out these intermediate theta output from mcmc samples?

Best, Peng.

SoManyPepople avatar Apr 09 '23 01:04 SoManyPepople

Hi Peng,

Sorry that BayesPrism does not output all posterior samples for now, as this would be very memory consuming. If you would like to estimate the posterior variance, you may convert the cv to variance by multipliying with the mean.

Best,

Tinyi

On Sat, Apr 8, 2023 at 9:10 PM FleetingTimeFlies @.***> wrote:

Hi Tinyi,

Thanks for your kind and qick reply. In fact, I don't care too much about the estimation of p-value for predicted cell fraction. I want to use a set of estimated theta values for other downstream analysis, and would it be possible to pull out these intermediate theta output from mcmc samples?

Best, Peng.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1501011889, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSZH33F67MVH4R46B3TXAID7FANCNFSM6AAAAAAVZN4RRI . You are receiving this because you commented.Message ID: @.***>

tinyi avatar Apr 10 '23 03:04 tinyi

Hi Tinyi, I was wondering whether BayesPrism can be used for cell typing in single cell data. For each single cell, we will have an inferred fraction (like we would have with bulk), and then assign the cell type to the one with highest value. What do you think ? Thanks ! Mi

On Thu, Mar 23, 2023 at 6:36 PM Tinyi Chu @.***> wrote:

Hi Mi,

To answer your question:

In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ?

It is not needed. I don't think random seeds have any significant effect on the result. BayesPrism already takes the posterior mean of MCMC samples.

Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

I am not sure how exactly you would like to set the null distribution, and how that null distribution would represent the null hypothesis you would like to test. Also, p value is a frequentist concept. In Bayesian analysis, you can derive credible intervals, which would be related the the coefficient of variation (CV) provided by in the output of bayesprism.

Best,

Tinyi

On Tue, Mar 21, 2023 at 11:20 AM Mi YANG @.***> wrote:

Thank you so much for your quick reply ! In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ? Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

On Tue, Mar 14, 2023 at 3:01 AM Tinyi Chu @.***> wrote:

Thank you for your interest in our work.

In theory, users can use as few as only one bulk sample. Results from the first (initial) Gibbs sampling will remain the same regardless of the number of bulk RNA-seq, while the accuracy of final (updated) Gibbs sampling will increase as the number of samples increases and may differ only slightly for different number of bulk samples. For bulk samples fewer than 10, we recommend the use of first Gibbs sampling (you may also inspect the results from the updated Gibbs sampling to see the difference).

Best,

Tinyi

On Mon, Mar 13, 2023 at 2:19 PM Mi YANG @.***> wrote:

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AB4NHS2P4AONEKR3CAMJKH3W35QJDANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub <

https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1467482481

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACWVFLTXGWKS7FUJL2OO3BDW4AJTNANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1478028880 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB4NHSYM6POZ6JU27R7B5FTW5HBKHANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1482003178, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWVFLX255HMFXNC4TKUO6LW5TF5LANCNFSM6AAAAAAVZN4RRI . You are receiving this because you authored the thread.Message ID: @.***>

Sophon-0 avatar Jul 20 '23 14:07 Sophon-0

Hi Mi,

This is an interesting idea. Here are my thoughts:

  1. there are already many mature cell typing methods specifically developed for scRNA-seq, either deep learning-based methods (for label transfer), or logstic regression based methods. It is unclear to me if BayesPrism will perform better than these methods.

  2. BayesPrism (a deconovlution method) was specifically developed for model admixture distribution (each cell is a sum of multiple cell types), rather than mixture distribution (each cell belong to only one cluster). In case of cell types/states lie on a cotinuum manifold, such as developmental trajectory or in the case of tumor heterogeneity, in theory you may use BayesPrism to decompose each cell to a combination of multiple cell states.

Best,

Tinyi

On Thu, Jul 20, 2023 at 10:14 AM Mi YANG @.***> wrote:

Hi Tinyi, I was wondering whether BayesPrism can be used for cell typing in single cell data. For each single cell, we will have an inferred fraction (like we would have with bulk), and then assign the cell type to the one with highest value. What do you think ? Thanks ! Mi

On Thu, Mar 23, 2023 at 6:36 PM Tinyi Chu @.***> wrote:

Hi Mi,

To answer your question:

In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ?

It is not needed. I don't think random seeds have any significant effect on the result. BayesPrism already takes the posterior mean of MCMC samples.

Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

I am not sure how exactly you would like to set the null distribution, and how that null distribution would represent the null hypothesis you would like to test. Also, p value is a frequentist concept. In Bayesian analysis, you can derive credible intervals, which would be related the the coefficient of variation (CV) provided by in the output of bayesprism.

Best,

Tinyi

On Tue, Mar 21, 2023 at 11:20 AM Mi YANG @.***> wrote:

Thank you so much for your quick reply ! In general, what do you think about running BayesPrism a few times with different seeds, and then average the result ? Do you see a merit in getting some pvalue ? for ex permute gene labels for each sample, then run it 100 times to have an empirical null distribution, then assess the pvalue of the result.

On Tue, Mar 14, 2023 at 3:01 AM Tinyi Chu @.***> wrote:

Thank you for your interest in our work.

In theory, users can use as few as only one bulk sample. Results from the first (initial) Gibbs sampling will remain the same regardless of the number of bulk RNA-seq, while the accuracy of final (updated) Gibbs sampling will increase as the number of samples increases and may differ only slightly for different number of bulk samples. For bulk samples fewer than 10, we recommend the use of first Gibbs sampling (you may also inspect the results from the updated Gibbs sampling to see the difference).

Best,

Tinyi

On Mon, Mar 13, 2023 at 2:19 PM Mi YANG @.***> wrote:

Hello, I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ? Thanks !

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AB4NHS2P4AONEKR3CAMJKH3W35QJDANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub <

https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1467482481

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACWVFLTXGWKS7FUJL2OO3BDW4AJTNANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub <

https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1478028880

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AB4NHSYM6POZ6JU27R7B5FTW5HBKHANCNFSM6AAAAAAVZN4RRI

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1482003178>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACWVFLX255HMFXNC4TKUO6LW5TF5LANCNFSM6AAAAAAVZN4RRI>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/30#issuecomment-1644005037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS4S5PZSHR3DGYPEHHTXRE4LHANCNFSM6AAAAAAVZN4RRI . You are receiving this because you commented.Message ID: @.***>

tinyi avatar Jul 20 '23 17:07 tinyi