sccomp icon indicating copy to clipboard operation
sccomp copied to clipboard

Use the numerical generative process to calibrate the model

Open CastielZhao opened this issue 3 years ago • 11 comments

Does the false positive rate we claim (e.g. 0.05) correspond to 5% of false positives given our no-association, no-outlier simulated data?

Calibration:

  • inference of associations. (read https://doi.org/10.1093/nargab/lqab005)
  • inference of outliers

CastielZhao avatar Jun 04 '21 06:06 CastielZhao

Calibrate inference of associations

  • Generate 100 datasets with the same total counts per subject (M size vector, where M is the number of subjects), for each dataset
  • Number of subjects 30, number of categories 20
  • Design matrix would have an intercept column and a factor of interest between -1 and 1
  • Setup coefficient to have same intercept (for simplicity), and zero slope
  • Generate the data
  • Execute sccomp (visit homepage of this repository)
    • FOR INSTALLATION DO: devtools::install_github("stemangiola/sccomp")
    • library(sccomp)
    • Follow the readme
  • Count how many categories were labelled as significantly changing (by default we are using the 95% credible interval. Which means that we expect 5% of calls to be false)

stemangiola avatar Jun 04 '21 07:06 stemangiola

"Setup coefficient to have same intercept (for simplicity), and zero slope" Are there any other constraints on coefficient? i.e. integer ? Range ? Also, I assume that "zero slope" means coeff=(beta0,beta0,...,beta0; beta1,beta1,...,beta1); that the first column repeats 20 times.

CastielZhao avatar Jun 07 '21 09:06 CastielZhao

"Setup coefficient to have same intercept (for simplicity), and zero slope" Are there any other constraints on coefficient? i.e. integer ? Range ?

Execute the code at the homepage of this repository and you will see what coefficients you get for a real dataset. You can get the range from those (except the intercept that should be zero for this test)

stemangiola avatar Jun 07 '21 09:06 stemangiola

About integer or not, it is exactly the same. When you do matrix multiplication between design and coefficient is the same.

stemangiola avatar Jun 07 '21 09:06 stemangiola

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

CastielZhao avatar Jun 08 '21 12:06 CastielZhao

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

Yes, run sccomp on your data set. See example dataset from github README. Start from a few and try to draw descriptive statistics.

stemangiola avatar Jun 08 '21 12:06 stemangiola

which function in the sccomp is used for detecting variation ?

CastielZhao avatar Jun 18 '21 06:06 CastielZhao

As I noticed the fuction: res = counts_obj %>% sccomp_glm( ~ type, sample, cell_group, count, approximate_posterior_inference = FALSE ) When analyzing multiple data frames, do I need to merge the data frames, or specifying different data frame by "cell goup " above? Also, type=category, count=count, sample=subject in our dictionary, right?

CastielZhao avatar Jun 23 '21 14:06 CastielZhao

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

stemangiola avatar Jun 23 '21 14:06 stemangiola

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

By data frames, I mean the output simulated data frames from my numeric generation process.

CastielZhao avatar Jun 25 '21 03:06 CastielZhao

one data frame includes M categories and N subjects.

another data frame includes M categories and N subjects.

one subject does constitute a very small dataset that cannot be used for regression, size = 1

stemangiola avatar Jun 25 '21 04:06 stemangiola