Dan Snow issues

Results 24 issues of


                                            Dan Snow

Spike model rebuild in Python/polars

Just for fun: if there's any free time during the summer we should do a quick (day or two) spike of a model pipeline rewrite using a Python stack. I...

Improve modeling multi-cards

Multi-card sales are excluded from the sales used to train the model for multiple reasons. The model predicts values per card, not per property; as such, multi-card sales are excluded...

Update `ingest` stage to use `noctua` `unload = TRUE` option

Now that https://github.com/DyfanJones/noctua/pull/215 is merged, we should update this repo to use the new option, preferably by setting it globally within `noctua_options(unload = TRUE)`. This will require a bit of...

pipeline

Revisit using a stacked model with the `stacks` package

Previously, the CCAO attempted to create a stacked/ensemble model using tidymodels functions. However, tidymodels' support for this method was at the time quite new, and it didn't work very well....

method

Test xgboost modeling engine

The Data Department recently performed some model benchmarking ([ccao-data/report-model-benchmark](https://github.com/ccao-data/report-model-benchmark)) comparing the run times of XGBoost and LightGBM. We found that the current iteration of XGBoost runs much faster than LightGBM...

method

Test propensity weights as case weights for ultra high value properties

method

Use `workflowsets` + racing to test lots and lots of models and recipes quickly

method

Improve townhome fuzzy grouping, conditionally add 211s to townhome groups

In 2022, we improved the townhome valuation methodology by implementing "fuzzy grouping". Basically, townhome units with similar, but not perfectly identical, features should receive similar values. Valuations pointed out that...

method

Feature idea - Linking hyperparameters during CV

# Problem Within LightGBM, [`num_leaves`](https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_leaves) is capped at 2 ^ [`max_depth`](https://lightgbm.readthedocs.io/en/latest/Parameters.html#max_depth). For example, if `num_leaves` is set to 1000 and `max_depth` is set to 5, then LightGBM will likely end...

feature

Feature idea - provide custom validation sets for early stopping

Thanks for creating this excellent package. I created a [similar fork of treesnip](https://gitlab.com/ccao-data-science---modeling/packages/lightsnip) but am planning to replace it with `{bonsai}` in all our production models. One feature that I...

feature