RCIT
RCIT copied to clipboard
RCIT/KCIT for causal discovery with mixed data
Hi,
I've been told that KCIT (and therefore I presume RCIT/RCoT) can be used with mixed continuous and discrete data. However, playing around with the package this doesn't seem to work. Is there something I need to adjust to make the tests work with mixed data?
Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? The code showing how this was done for the causal discovery experiments in your paper doesn't seem to be available in this repo. Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance?
Many thanks.
Is there something I need to adjust to make the tests work with mixed data? You should try to binarize the discrete data. So if a discrete variables takes on k values in the dataset, then you subtitute that variable with k-1 binary variables. The reason why you do this is to simplify functional relationships.
Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? ...Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance? Yes, but you need to write a wrapper function like the following where suffStat is a list containing the data:
RCIT_wrap <-function(x_index,y_index,z_index,suffStat){ out = RCIT(suffStat$data[,x_index],suffStat$data[,y_index],suffStat$data[,z_index]) return(out$p) }
Thanks! Unfortunately I'm running up against some errors with that though. Firstly, an error
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : is.atomic(x) is not TRUE (which was this error when using my actual data: Error: (list) object cannot be coerced to type 'double') suggested that "suffStat$data[,x_index]" etc. needed to be "unlisted", so I did that:
testdata <- select_if(mtcars, is.numeric)
RCIT_wrap <- function(x_index, y_index, z_index, suffStat) {
out = RCIT(unlist(suffStat$data[,x_index]), unlist(suffStat$data[,y_index]), unlist(suffStat$data[,z_index]))
return(out$p)
}
suffStat <- list(data = testdata)
res <- pcalg::fci(suffStat, indepTest = RCIT_wrap,
alpha = 0.9999, labels = names(testdata))
But now I'm getting the error:
Error in cbind(y, z) : number of rows of matrices must match (see arg 2)
and I'm not sure how to resolve that.
Also, is there any way of analysing discrete variables without having to binarize them? When I spoke to Kun Zhang about it before he said something about needing to use the delta kernel or a Gaussian kernel with a very small kernel width for mixed data.
Hello, @MaxKerney I am working on the same thing. I am using RCIT as a CI test in the FCI function from pcalg-however I runn into the same errors. Do you know the solution of it?
Thanks Angela
Hi @Angela446-lgtm,
I'm afraid not, I ended up using a different causal discovery method instead (https://github.com/Biwei-Huang/Generalized-Score-Functions-for-Causal-Discovery)
Max
Ok!Thank you for your quick reply to my question.
Angela,
Sorry for these errors. People have gotten this error in the past when they pass a data frame into RCIT as opposed to a matrix, or one of their variables has zero variance.
If those two dont solve it, hopefully you can send me your data and your code that causes the errors, then i should be able to solve the issue. It will help others experiencing the same problem, since i can update the code accordingly
Indeed, now I am passing a matrix and not a dataframe and eveything works fine. Thank you.