stray icon indicating copy to clipboard operation
stray copied to clipboard

priors for a data with bacterial hierarchy and phylogenetic distance between samples

Open Jigyasa3 opened this issue 6 years ago • 3 comments

Dear @jsilve24

My dataset contains a combination of bacterial hierarchical information and phylogenetic relationship between samples.

I went through both "pibble" and "basset" vignettes, I was wondering if it's possible to join the two priors? From what I understand, the relationship between samples is modeled in Theta and Gamma variables in "basset", whereas in "pibble" the independence between bacterial taxa is modeled by Gamma and Omega.

My naive take on the "new priors"- X<-distRoot(tree, tips = "all", method = "patristic") #calculate phylogenetic distance between samples D<-ntaxa(counts_table) #bacterial_taxa as rows in counts_table Gamma <- function(X) SE(X, sigma=5, rho=10) # Create the partial function as shown in "basset" vignette Theta <- function(X) matrix(0, D-1, ncol(X)) # Create the partial function as shown in "basset" vignette upsilon <- D-1+3 omega<-diag(D) #for indepedence of taxa G <- cbind(diag(ntaxa(dat)-1), -1) Xi <- (upsilon-D)G%%Omega%*%t(G) diag(Xi) <- 1 #find the posterior- prior<-stray::basset(NULL, X, upsilon, Theta, Gamma, Xi)

Questions- a) why is upsilon (ntaxa-1+3) in "basette" vignette, where as it is (ntaxa+3) in "pibble" vignette? Is there a difference? b) Would combining two priors make sense? How can my naive approach be improved?

Jigyasa3 avatar Nov 01 '19 03:11 Jigyasa3

Sorry for the delay.

Both basset and pibble have the same types of priors; its just that in basset the priors are functions rather than fixed matrices. You want to stick with pibble for this as you don't really need the non-linear modeling as specified in basset.

You probably want to house your prior regarding the phylogenetic relationship between samples in Gamma and the phylogenetic relationship between taxa in Xi and upsilon.

It would look something like this (there are many ways of calculating distance between taxa that may be relevant, I am just following your example):

DS <-distRoot(tree.samples, tips = "all", method = "patristic") #calculate phylogenetic distance between samples
DT <- distRoot(tree.taxa tips = "all", method = "patristic") #calculate phylogenetic distance between taxa

D<-ntaxa(counts_table) #bacterial_taxa as rows in counts_table)
Theta <- matrix(0, D-1, Q) # Q is the number of covariates in your linear model (including intercept)

Gamma <- SE(DS, sigma=1, rho=median(DS)) # a good default for rho typically

# Xi is a bit more complex because we have to project into log-ratio coordinates
upsilon <- D+2 # specifies that prior for Xi is weak - D+2 vs. D+3 makes a minimal difference probably (regarding your first question)
Omega <- SE(DT, sigma=1, rho=median(DT)) # a good default for rho typically
G <- cbind(diag(D-1), -1)
Xi <- (upsilon-D)G%%Omega%*%t(G)
diag(Xi) <- 1

Does that make sense?

jsilve24 avatar Nov 04 '19 18:11 jsilve24

Hey @jsilve24

Thank you for suggesting the changes to the priors.

I have a follow-up question- The DT variable is absent in my data. i.e. I don't have phylogenetic distance between bacterial taxa.

From what I understand from the "pibble" vignette, tree.taxa is not used in calculating Omega. Omega is this -" Omega <- diag(ntaxa(dat)) " Would it make sense to use the "pibble" omega instead, as I don't have tree.taxa information?

Jigyasa3 avatar Nov 12 '19 02:11 Jigyasa3

Using the pibble's prior, I am running into the following error-

X <- t(model.matrix(~diet_type_number2 + diet_type_number+batch_effect, data=cellulose_dat)) nrow(X) is equal to 5 #prior- DS <- cophenetic(pruned.trees_cellulases) #calculate phylogenetic distance between samples

D<-ntaxa(cellulases_stray) #bacterial_taxa as rows in counts_table Theta <- matrix(0, ntaxa(cellulases_stray)-1, nrow(X)) # X is the number of covariates in your linear model (including intercept)

Gamma <- SE(DS, sigma=1, rho=median(DS)) # a good default for rho typically Omega <- diag(ntaxa(cellulases_stray)) #from the pibble vignette # Xi is a bit more complex because we have to project into log-ratio coordinates upsilon <- D+2 # specifies that prior for Xi is weak

G <- cbind(diag(D-1), -1) Xi <- (upsilon-D)G%%Omega%*%t(G) diag(Xi) <- 1 priors <- pibble(NULL, X, upsilon, Theta, Gamma, Xi)

"Error in try_set_dims(c(nrow(X), ncol(Theta), nrow(Gamma), ncol(Gamma), : Dimension missmatch in arguments: [5,5,108,108]"

#Using basset's prior, Theta is not recognised- prior<-basset(NULL, X, upsilon, Theta, Gamma, Xi)

"Error in Theta(X) : could not find function "Theta""

Jigyasa3 avatar Nov 12 '19 03:11 Jigyasa3