normalize data for dcm
This is a start on https://github.com/UDST/urbansim/issues/208 still to do:
- [x] make it optional
- [x] add test for both behaviors
- [x] add l1 l2 regularization
- [x] test regularization
But I thout I'd open this to get some feedback.
I don't know why CI is failing. Help?
AppVeyor is Failed building wheel for pandana I don't think I can fix that.
I am working on the travis things.
Coverage decreased (-0.4%) to 93.959% when pulling d10e19417453f3a8f0fc84a7ab19eb94e1a6cd2a on SEMCOG:normalize-data into 24970393b117cbf2464eeb336f425cbfb641fa1b on UDST:master.
Coverage increased (+0.06%) to 94.409% when pulling d832f16d3b9fb1edc0da86210b0601ccaf5b1ac8 on SEMCOG:normalize-data into 79f815a6503e109f50be270cee92d0f4a34f49ef on UDST:master.
Thanks for this @Eh2406! The passing Travis builds are the relevant benchmark here. The appveyor Windows build are failing due to pandana on Windows, which is unrelated to the present PR. We'll fix that, or we may remove pandana as a hard CI dependency of core urbansim (as little in this repo specifically needs pandana).
I don't have a lot of experience with pytest. What test would you like to see, and how do you recommend I implement them?
Ping. To be clear I am stuck waiting for advice on what tests you would like to see.
Some thoughts (curious what others think too):
test_dcm.py
- Create a "normalized_dcm" fixture that takes the existing basic_dcm fixture as a starting point and set values for normalized/l1/l2
- Using the test_mnl_dcm function as a template, implement a high-level smoke-test just to see that the model with normalization can be fitted, that probabilities can be calculated from the resulting model, and that 'Normalization Mean' / 'Normalization Std' are in the resulting fit parameters.
test_mnl.py
- Can any of the normalization/regularization changes to mnl.py be tested here? Perhaps a test for expected coefficients that are different when non-zero l1/l2 values supplied? And a smoke-test to ensure that mnl.mnl_simulate() runs with normalization_mean / normalization_std?
I parameterized basic_dcm to run with normalization and 3 different regularizations. Then looked through all the newly failing tests. This sussed out 2 places where I did not pass the new arguments in the correct order. Oops. Meny of the test can be made to pass with an if on basic_dcm.normalize. For the 3 that could not be done so easily, I added a simple_dcm that matched the old basic_dcm and switched them to use that.
Thoughts?
merged and ci is green. Thoughts? What else would you like to see before accepting?