private-pgm
private-pgm copied to clipboard
Provenance for adult.csv?
Hi,
Just curious. What are the steps taken to discretize the adult dataset into what is here as data/adult.csv? I notice that fnlwgt in particular is discretized pretty heavily, for example.
Good question, here is the file I used to discretize adult:
discretize.py: https://pastebin.com/yGLKUaey config.yml: https://pastebin.com/pim42cD9
You may have to make some slight modifications with path to get it to run, but hopefully shouldn't be too bad.
You're right that we were aggressive with discretization, would be nice to develop techniques that work better with continuous data.