formulaic
formulaic copied to clipboard
DOC: Contrast Specification
Another thing I was considering is wheater we are able to use formulaic for incremental training of some models.
Let's imagine we have a dataset that has the categorical variable "job" with 6 classes: teacher, doctor, driver, judge, professor, and dancer.
If we do batch training it may be that not all 6 different levels are going to be in one batch. For instance, the first batch may only have teacher and driver categories and thus matrix would have 2 columns. The next batch, however, would have a doctor, professor, dancer, and teacher and thus matrix would have 4 columns.
This is when a problem happens because matrixes would not match. Is it possible to somehow predefine all levels so that formulaic would keep empty columns for categories that are not in the current batch?
Yes! This is doable in formulaic, but again... ahem... documentation.
import pandas
from formulaic import model_matrix
# Option 1:
model_matrix('cat', pandas.DataFrame({'cat': pandas.Categorical(['a', 'b'], categories=['a', 'b', 'c'])}))
# Option 2:
model_matrix('C(cat, levels=["a", "b", "c"])', pandas.DataFrame({'cat': ['a', 'b']}))