private-pgm icon indicating copy to clipboard operation
private-pgm copied to clipboard

Provenance for adult.csv?

Open DENVERCODER999 opened this issue 3 years ago • 1 comments

Hi, Just curious. What are the steps taken to discretize the adult dataset into what is here as data/adult.csv? I notice that fnlwgt in particular is discretized pretty heavily, for example.

DENVERCODER999 avatar Sep 09 '22 16:09 DENVERCODER999

Good question, here is the file I used to discretize adult:

discretize.py: https://pastebin.com/yGLKUaey config.yml: https://pastebin.com/pim42cD9

You may have to make some slight modifications with path to get it to run, but hopefully shouldn't be too bad.

You're right that we were aggressive with discretization, would be nice to develop techniques that work better with continuous data.

ryan112358 avatar Dec 02 '22 02:12 ryan112358