stray
                                
                                
                                
                                    stray copied to clipboard
                            
                            
                            
                        priors for a data with bacterial hierarchy and phylogenetic distance between samples
Dear @jsilve24
My dataset contains a combination of bacterial hierarchical information and phylogenetic relationship between samples.
I went through both "pibble" and "basset" vignettes, I was wondering if it's possible to join the two priors? From what I understand, the relationship between samples is modeled in Theta and Gamma variables in "basset", whereas in "pibble" the independence between bacterial taxa is modeled by Gamma and Omega.
My naive take on the "new priors"- X<-distRoot(tree, tips = "all", method = "patristic") #calculate phylogenetic distance between samples D<-ntaxa(counts_table) #bacterial_taxa as rows in counts_table Gamma <- function(X) SE(X, sigma=5, rho=10) # Create the partial function as shown in "basset" vignette Theta <- function(X) matrix(0, D-1, ncol(X)) # Create the partial function as shown in "basset" vignette upsilon <- D-1+3 omega<-diag(D) #for indepedence of taxa G <- cbind(diag(ntaxa(dat)-1), -1) Xi <- (upsilon-D)G%%Omega%*%t(G) diag(Xi) <- 1 #find the posterior- prior<-stray::basset(NULL, X, upsilon, Theta, Gamma, Xi)
Questions- a) why is upsilon (ntaxa-1+3) in "basette" vignette, where as it is (ntaxa+3) in "pibble" vignette? Is there a difference? b) Would combining two priors make sense? How can my naive approach be improved?
Sorry for the delay.
Both basset and pibble have the same types of priors; its just that in basset the priors are functions rather than fixed matrices. You want to stick with pibble for this as you don't really need the non-linear modeling as specified in basset.
You probably want to house your prior regarding the phylogenetic relationship between samples in Gamma and the phylogenetic relationship between taxa in Xi and upsilon.
It would look something like this (there are many ways of calculating distance between taxa that may be relevant, I am just following your example):
DS <-distRoot(tree.samples, tips = "all", method = "patristic") #calculate phylogenetic distance between samples
DT <- distRoot(tree.taxa tips = "all", method = "patristic") #calculate phylogenetic distance between taxa
D<-ntaxa(counts_table) #bacterial_taxa as rows in counts_table)
Theta <- matrix(0, D-1, Q) # Q is the number of covariates in your linear model (including intercept)
Gamma <- SE(DS, sigma=1, rho=median(DS)) # a good default for rho typically
# Xi is a bit more complex because we have to project into log-ratio coordinates
upsilon <- D+2 # specifies that prior for Xi is weak - D+2 vs. D+3 makes a minimal difference probably (regarding your first question)
Omega <- SE(DT, sigma=1, rho=median(DT)) # a good default for rho typically
G <- cbind(diag(D-1), -1)
Xi <- (upsilon-D)G%%Omega%*%t(G)
diag(Xi) <- 1
Does that make sense?
Hey @jsilve24
Thank you for suggesting the changes to the priors.
I have a follow-up question- The DT variable is absent in my data. i.e. I don't have phylogenetic distance between bacterial taxa.
From what I understand from the "pibble" vignette, tree.taxa is not used in calculating Omega. Omega is this -" Omega <- diag(ntaxa(dat)) " Would it make sense to use the "pibble" omega instead, as I don't have tree.taxa information?
Using the pibble's prior, I am running into the following error-
X <- t(model.matrix(~diet_type_number2 + diet_type_number+batch_effect, data=cellulose_dat)) nrow(X) is equal to 5 #prior- DS <- cophenetic(pruned.trees_cellulases) #calculate phylogenetic distance between samples
D<-ntaxa(cellulases_stray) #bacterial_taxa as rows in counts_table Theta <- matrix(0, ntaxa(cellulases_stray)-1, nrow(X)) # X is the number of covariates in your linear model (including intercept)
Gamma <- SE(DS, sigma=1, rho=median(DS)) # a good default for rho typically Omega <- diag(ntaxa(cellulases_stray)) #from the pibble vignette # Xi is a bit more complex because we have to project into log-ratio coordinates upsilon <- D+2 # specifies that prior for Xi is weak
G <- cbind(diag(D-1), -1) Xi <- (upsilon-D)G%%Omega%*%t(G) diag(Xi) <- 1 priors <- pibble(NULL, X, upsilon, Theta, Gamma, Xi)
"Error in try_set_dims(c(nrow(X), ncol(Theta), nrow(Gamma), ncol(Gamma), : Dimension missmatch in arguments: [5,5,108,108]"
#Using basset's prior, Theta is not recognised- prior<-basset(NULL, X, upsilon, Theta, Gamma, Xi)
"Error in Theta(X) : could not find function "Theta""