FRASER
FRASER copied to clipboard
Questions about optimal q and implementation="AE" vs. "PCA"
I understand using different values of q can significantly affect results, so I wanted to ask whether I'm choosing q and the implementation
arg correctly.
My dataset is 54 affected male fibroblast samples.
- One thing I'm curious about is why the optimal q's found by FRASER
optimHyperParams
(psi5_q = 5, psi3_q = 2, psiSite_q = 2) are much smaller than the q found by OUTRIDERfindEncodingDim
(q=12) for the same set of samples, and also smaller than the q used in the colab notebook example with 50 samples (q=10)?
I'm using FRASER to calculate counts for splice junctions, and STARv2 to get gene counts for OUTRIDER.
When I run
for(i in c("psi5", "psi3", "psiSite")) {
fds <- optimHyperParams(fds, i, plot=FALSE, implementation="PCA", BPPARAM=bpparam)
plotEncDimSearch(fds, type=i, plotType="auc")
plotEncDimSearch(fds, type=i, plotType="loss")
}}
I get plots that look like these (the plots for psi5, psi3, and psiSite look nearly identical):
and this is the plot from OUTRIDER plotEncDimSearch(ods)
for the same set of samples:
-
From the pre-print, I understood that auto-encoder correction performs much better than PCA, so I run FRASER with implementation="AE". Is it fine to run optimHyperParams(..) with the default implementation="PCA", and use those q's for FRASER with implementation="AE"?
-
I tried running optimHyperParams(..) with implementation="AE" (and other values taken from the source code) but got validation errors saying these are not recognized implementations:
> fds = optimHyperParams(fds, 'psi5', implementation="AE", plot=FALSE)
Error in needsHyperOpt(implementation) : Method not found: 'AE'!
> fds = optimHyperParams(fds, 'psi5', implementation="FRASER", plot=FALSE)
dPsi filter:FALSE: 1307001 TRUE: 57861
Exclusion matrix: FALSE: 1335180 TRUE: 29682
Thu Jul 9 02:02:00 2020: Injecting outliers: 9001 / 800 (primary/secondary)
Thu Jul 9 02:02:05 2020: Run hyper optimization with 13 options.
1 ; 2 ; 0
Thu Jul 9 02:02:06 2020 ; q: 2 ; noise: 0
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: 'arg' should be one of “PCA”, “PCA-BB-Decoder”, “AE”, “AE-weighted”, “PCA-BB-full”, “fullAE”, “PCA-regression”, “PCA-reg-full”, “PCA-BB-Decoder-no-weights”, “BB”
- Given that optimal q computed by
optimHyperParams
for psi5, psi3, psiSite are sometimes not the same, what is the recommended way to choose which q to pass to FRASER? (currently I'm usingq=max(bestQ(fds, type="psi5"), bestQ(fds, type="psi3"), bestQ(fds, type="psiSite"))
)
After glancing at the source code, I also tried
q = list(psi5=bestQ(fds, type="psi5"), psi3=bestQ(fds, type="psi3"), psiSite=bestQ(fds, type="psiSite"))
fds <- FRASER(fds, q=q, implementation ="AE", BPPARAM=MulticoreParam(4))
but I got this error:
Wed Jul 8 18:01:50 2020: Fit step for: 'psi5'.
Wed Jul 8 18:01:50 2020: Running fit with correction method: AE
FALSE TRUE
97078 30265
Error in 1:nPcs : NA/NaN argument
I realize now I can separately run fit
and the other methods inside FRASER, and pass in a different q to each. Is this recommended?