formulaic icon indicating copy to clipboard operation
formulaic copied to clipboard

DOC: Contrast Specification

Open petrhrobar opened this issue 2 years ago • 1 comments

Another thing I was considering is wheater we are able to use formulaic for incremental training of some models.

Let's imagine we have a dataset that has the categorical variable "job" with 6 classes: teacher, doctor, driver, judge, professor, and dancer.

If we do batch training it may be that not all 6 different levels are going to be in one batch. For instance, the first batch may only have teacher and driver categories and thus matrix would have 2 columns. The next batch, however, would have a doctor, professor, dancer, and teacher and thus matrix would have 4 columns.

This is when a problem happens because matrixes would not match. Is it possible to somehow predefine all levels so that formulaic would keep empty columns for categories that are not in the current batch?

petrhrobar avatar May 05 '22 19:05 petrhrobar

Yes! This is doable in formulaic, but again... ahem... documentation.

import pandas
from formulaic import model_matrix

# Option 1:
model_matrix('cat', pandas.DataFrame({'cat': pandas.Categorical(['a', 'b'], categories=['a', 'b', 'c'])}))

# Option 2:
model_matrix('C(cat, levels=["a", "b", "c"])', pandas.DataFrame({'cat': ['a', 'b']}))

matthewwardrop avatar May 07 '22 06:05 matthewwardrop