RCIT icon indicating copy to clipboard operation
RCIT copied to clipboard

RCIT/KCIT for causal discovery with mixed data

Open MaxKerney opened this issue 6 years ago • 7 comments

Hi,

I've been told that KCIT (and therefore I presume RCIT/RCoT) can be used with mixed continuous and discrete data. However, playing around with the package this doesn't seem to work. Is there something I need to adjust to make the tests work with mixed data?

Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? The code showing how this was done for the causal discovery experiments in your paper doesn't seem to be available in this repo. Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance?

Many thanks.

MaxKerney avatar Sep 27 '19 15:09 MaxKerney

Is there something I need to adjust to make the tests work with mixed data? You should try to binarize the discrete data. So if a discrete variables takes on k values in the dataset, then you subtitute that variable with k-1 binary variables. The reason why you do this is to simplify functional relationships.

Also, if using mixed data is possible, how could KCIT or RCIT then be used with an algorithm like FCI for causal discovery? ...Could KCIT or RCIT be used as a CI test in the FCI function from pcalg, for instance? Yes, but you need to write a wrapper function like the following where suffStat is a list containing the data:

RCIT_wrap <-function(x_index,y_index,z_index,suffStat){ out = RCIT(suffStat$data[,x_index],suffStat$data[,y_index],suffStat$data[,z_index]) return(out$p) }

ericstrobl avatar Sep 28 '19 04:09 ericstrobl

Thanks! Unfortunately I'm running up against some errors with that though. Firstly, an error Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : is.atomic(x) is not TRUE (which was this error when using my actual data: Error: (list) object cannot be coerced to type 'double') suggested that "suffStat$data[,x_index]" etc. needed to be "unlisted", so I did that:

testdata <- select_if(mtcars, is.numeric)
RCIT_wrap <- function(x_index, y_index, z_index, suffStat) {
    out = RCIT(unlist(suffStat$data[,x_index]), unlist(suffStat$data[,y_index]), unlist(suffStat$data[,z_index]))
    return(out$p)
    }
suffStat <- list(data = testdata)
res <- pcalg::fci(suffStat, indepTest = RCIT_wrap,
                  alpha = 0.9999, labels = names(testdata))

But now I'm getting the error: Error in cbind(y, z) : number of rows of matrices must match (see arg 2) and I'm not sure how to resolve that.

Also, is there any way of analysing discrete variables without having to binarize them? When I spoke to Kun Zhang about it before he said something about needing to use the delta kernel or a Gaussian kernel with a very small kernel width for mixed data.

MaxKerney avatar Sep 28 '19 10:09 MaxKerney

Hello, @MaxKerney I am working on the same thing. I am using RCIT as a CI test in the FCI function from pcalg-however I runn into the same errors. Do you know the solution of it?

Thanks Angela

Angela446-lgtm avatar Oct 12 '21 15:10 Angela446-lgtm

Hi @Angela446-lgtm,

I'm afraid not, I ended up using a different causal discovery method instead (https://github.com/Biwei-Huang/Generalized-Score-Functions-for-Causal-Discovery)

Max

MaxKerney avatar Oct 12 '21 16:10 MaxKerney

Ok!Thank you for your quick reply to my question.

Angela446-lgtm avatar Oct 12 '21 16:10 Angela446-lgtm

Angela,

Sorry for these errors. People have gotten this error in the past when they pass a data frame into RCIT as opposed to a matrix, or one of their variables has zero variance.

If those two dont solve it, hopefully you can send me your data and your code that causes the errors, then i should be able to solve the issue. It will help others experiencing the same problem, since i can update the code accordingly

ericstrobl avatar Oct 12 '21 17:10 ericstrobl

Indeed, now I am passing a matrix and not a dataframe and eveything works fine. Thank you.

Angela446-lgtm avatar Oct 13 '21 08:10 Angela446-lgtm