dodiscover icon indicating copy to clipboard operation
dodiscover copied to clipboard

PC algo only working with int data inputs

Open robertness opened this issue 2 years ago • 1 comments

Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters. I tried running PC on this data:

A S T L B E X D
no yes no no yes no no yes
no yes no no no no no no
no no yes no no yes yes yes
no no no no yes no no yes
no no no no no no no yes

But it threw an error. To get it to work I had to convert the values to ints.

def convert_to_int(df):
    for var in df.columns:
        data[var] = [1 if x == "yes" else 0 for x in data[var]]
    return df
data_mod = convert_to_int(data)

pc.fit(data_mod, context)

Calling this a bug. pc.fit(data, context) should work.

robertness avatar Dec 09 '22 20:12 robertness

Could the user just call an Encoder preprocessing function from scikit-learn? Or should we add that step for them? Either way good catch, we should document this accordingly for any categorical/discrete tests.

adam2392 avatar Dec 09 '22 22:12 adam2392