urbansim icon indicating copy to clipboard operation
urbansim copied to clipboard

normalize data for dcm

Open Eh2406 opened this issue 7 years ago • 10 comments

This is a start on https://github.com/UDST/urbansim/issues/208 still to do:

  • [x] make it optional
  • [x] add test for both behaviors
  • [x] add l1 l2 regularization
  • [x] test regularization

But I thout I'd open this to get some feedback.

Eh2406 avatar Mar 30 '18 15:03 Eh2406

I don't know why CI is failing. Help?

Eh2406 avatar Apr 05 '18 16:04 Eh2406

AppVeyor is Failed building wheel for pandana I don't think I can fix that. I am working on the travis things.

Eh2406 avatar Apr 09 '18 15:04 Eh2406

Coverage Status

Coverage decreased (-0.4%) to 93.959% when pulling d10e19417453f3a8f0fc84a7ab19eb94e1a6cd2a on SEMCOG:normalize-data into 24970393b117cbf2464eeb336f425cbfb641fa1b on UDST:master.

coveralls avatar Apr 09 '18 15:04 coveralls

Coverage Status

Coverage increased (+0.06%) to 94.409% when pulling d832f16d3b9fb1edc0da86210b0601ccaf5b1ac8 on SEMCOG:normalize-data into 79f815a6503e109f50be270cee92d0f4a34f49ef on UDST:master.

coveralls avatar Apr 09 '18 15:04 coveralls

Thanks for this @Eh2406! The passing Travis builds are the relevant benchmark here. The appveyor Windows build are failing due to pandana on Windows, which is unrelated to the present PR. We'll fix that, or we may remove pandana as a hard CI dependency of core urbansim (as little in this repo specifically needs pandana).

janowicz avatar Apr 18 '18 17:04 janowicz

I don't have a lot of experience with pytest. What test would you like to see, and how do you recommend I implement them?

Eh2406 avatar Apr 20 '18 19:04 Eh2406

Ping. To be clear I am stuck waiting for advice on what tests you would like to see.

Eh2406 avatar May 01 '18 14:05 Eh2406

Some thoughts (curious what others think too): test_dcm.py

  • Create a "normalized_dcm" fixture that takes the existing basic_dcm fixture as a starting point and set values for normalized/l1/l2
  • Using the test_mnl_dcm function as a template, implement a high-level smoke-test just to see that the model with normalization can be fitted, that probabilities can be calculated from the resulting model, and that 'Normalization Mean' / 'Normalization Std' are in the resulting fit parameters.

test_mnl.py

  • Can any of the normalization/regularization changes to mnl.py be tested here? Perhaps a test for expected coefficients that are different when non-zero l1/l2 values supplied? And a smoke-test to ensure that mnl.mnl_simulate() runs with normalization_mean / normalization_std?

janowicz avatar May 02 '18 03:05 janowicz

I parameterized basic_dcm to run with normalization and 3 different regularizations. Then looked through all the newly failing tests. This sussed out 2 places where I did not pass the new arguments in the correct order. Oops. Meny of the test can be made to pass with an if on basic_dcm.normalize. For the 3 that could not be done so easily, I added a simple_dcm that matched the old basic_dcm and switched them to use that.

Thoughts?

Eh2406 avatar May 03 '18 15:05 Eh2406

merged and ci is green. Thoughts? What else would you like to see before accepting?

Eh2406 avatar May 24 '18 16:05 Eh2406