MultiK icon indicating copy to clipboard operation
MultiK copied to clipboard

How to assign optimal k value to a new variable?

Open RegnerM2015 opened this issue 3 years ago • 6 comments

Hi @siyao-liu

Can you suggest some commands to automatically return the optimal k value based on the convex hull without having to look at the diagnostic plot (i.e. for data analysis pipeline purposes)?

I see that this value is printed to the console via findOptK, but it would amazing if I could store this value as a new variable without having to edit the source code. Thanks!

RegnerM2015 avatar Feb 04 '22 20:02 RegnerM2015

@RegnerM2015 Hi Matt, didn't running DiagMultiKPlot() print the optimal k values?

siyao-liu avatar Feb 05 '22 00:02 siyao-liu

Yes, DiagMultiKPlot() prints the optimal k values to the console. However, I am trying to implement MultiK in a pipeline context (non-interactively on a remote compute cluster). I will be running this pipeline individually for 20 scRNA-seq samples, and the optimal k for each dataset will likely be different.

Therefore, I was wondering if there was a way to automatically input an optimal k value for getClusters() without having to copy and paste the printed value from the console. Sorry for any confusion and thanks for your help!

RegnerM2015 avatar Feb 05 '22 19:02 RegnerM2015

@RegnerM2015 Ok. I will take a look and have MultiK spit out optimal k values automatically and store it as a variable. However, I would still recommend manually checking the diagnostic plots because some other k values may also be optimal but they didn't make it to the optimal k solutions from MultiK due to the hard cutoff set in the algorithm.

siyao-liu avatar Feb 06 '22 01:02 siyao-liu

Hi @siyao-liu

I figured out how to store the optimal k as a new variable! I reconstruct the tog object for input into findOptK() using the following:

tog <- as.data.frame(table(multik$k)[table(multik$k) > 1])
colnames(tog)[1] <- "ks"
pacobj <- CalcPAC(x1=0.1, x2=0.9, xvec = tog$ks, ml = multik$consensus)
tog$rpac <- pacobj$rPAC
tog$one_minus_rpac  <- 1-tog$rpac
optK <- findOptK(tog)

RegnerM2015 avatar Feb 07 '22 15:02 RegnerM2015

@RegnerM2015 Hi Matt, glad that you figured out. Sorry I haven't got time to get to this. I will try to work on it soon and make sure the optimal k will be outputted in the new release of MultiK.

siyao-liu avatar Feb 07 '22 15:02 siyao-liu

No rush at all! The code above should suffice for my specific case. I appreciate your help. Thanks!

RegnerM2015 avatar Feb 07 '22 15:02 RegnerM2015