FRASER icon indicating copy to clipboard operation
FRASER copied to clipboard

Questions about optimal q and implementation="AE" vs. "PCA"

Open bw2 opened this issue 3 years ago • 4 comments

I understand using different values of q can significantly affect results, so I wanted to ask whether I'm choosing q and the implementation arg correctly.

My dataset is 54 affected male fibroblast samples.

  1. One thing I'm curious about is why the optimal q's found by FRASER optimHyperParams (psi5_q = 5, psi3_q = 2, psiSite_q = 2) are much smaller than the q found by OUTRIDER findEncodingDim (q=12) for the same set of samples, and also smaller than the q used in the colab notebook example with 50 samples (q=10)?

I'm using FRASER to calculate counts for splice junctions, and STARv2 to get gene counts for OUTRIDER.

When I run

for(i in c("psi5", "psi3", "psiSite")) {
    fds <- optimHyperParams(fds, i, plot=FALSE, implementation="PCA", BPPARAM=bpparam)
    plotEncDimSearch(fds, type=i, plotType="auc")
    plotEncDimSearch(fds, type=i, plotType="loss")
}}

I get plots that look like these (the plots for psi5, psi3, and psiSite look nearly identical): fibroblasts_M_without_GTEX__54_samples_0667E4B6A8_plotEncDimSearch_psi3_AUC fibroblasts_M_without_GTEX__54_samples_0667E4B6A8_plotEncDimSearch_psi3_loss

and this is the plot from OUTRIDER plotEncDimSearch(ods) for the same set of samples:

fibroblasts_M_without_GTEX__plotEncDimSearch

  1. From the pre-print, I understood that auto-encoder correction performs much better than PCA, so I run FRASER with implementation="AE". Is it fine to run optimHyperParams(..) with the default implementation="PCA", and use those q's for FRASER with implementation="AE"?

  2. I tried running optimHyperParams(..) with implementation="AE" (and other values taken from the source code) but got validation errors saying these are not recognized implementations:

> fds = optimHyperParams(fds, 'psi5', implementation="AE", plot=FALSE)
Error in needsHyperOpt(implementation) : Method not found: 'AE'!

> fds = optimHyperParams(fds, 'psi5', implementation="FRASER", plot=FALSE)
dPsi filter:FALSE: 1307001	TRUE: 57861
Exclusion matrix: FALSE: 1335180	TRUE: 29682
Thu Jul  9 02:02:00 2020: Injecting outliers: 9001 / 800 (primary/secondary)
Thu Jul  9 02:02:05 2020: Run hyper optimization with 13 options.
1 ;	 2 ;	 0
Thu Jul  9 02:02:06 2020 ; q: 2 ; noise:  0
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: 'arg' should be one of “PCA”, “PCA-BB-Decoder”, “AE”, “AE-weighted”, “PCA-BB-full”, “fullAE”, “PCA-regression”, “PCA-reg-full”, “PCA-BB-Decoder-no-weights”, “BB”
  1. Given that optimal q computed by optimHyperParams for psi5, psi3, psiSite are sometimes not the same, what is the recommended way to choose which q to pass to FRASER? (currently I'm using q=max(bestQ(fds, type="psi5"), bestQ(fds, type="psi3"), bestQ(fds, type="psiSite")))

After glancing at the source code, I also tried

q = list(psi5=bestQ(fds, type="psi5"), psi3=bestQ(fds, type="psi3"), psiSite=bestQ(fds, type="psiSite"))
fds <- FRASER(fds, q=q, implementation ="AE",  BPPARAM=MulticoreParam(4))

but I got this error:

Wed Jul  8 18:01:50 2020: Fit step for: 'psi5'.
Wed Jul  8 18:01:50 2020: Running fit with correction method: AE

FALSE  TRUE 
97078 30265 
Error in 1:nPcs : NA/NaN argument

I realize now I can separately run fit and the other methods inside FRASER, and pass in a different q to each. Is this recommended?

bw2 avatar Jul 08 '20 22:07 bw2