dodiscover
dodiscover copied to clipboard
PC algo only working with int data inputs
Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters. I tried running PC on this data:
| A | S | T | L | B | E | X | D |
|---|---|---|---|---|---|---|---|
| no | yes | no | no | yes | no | no | yes |
| no | yes | no | no | no | no | no | no |
| no | no | yes | no | no | yes | yes | yes |
| no | no | no | no | yes | no | no | yes |
| no | no | no | no | no | no | no | yes |
But it threw an error. To get it to work I had to convert the values to ints.
def convert_to_int(df):
for var in df.columns:
data[var] = [1 if x == "yes" else 0 for x in data[var]]
return df
data_mod = convert_to_int(data)
pc.fit(data_mod, context)
Calling this a bug. pc.fit(data, context) should work.
Could the user just call an Encoder preprocessing function from scikit-learn? Or should we add that step for them? Either way good catch, we should document this accordingly for any categorical/discrete tests.