formulaic
formulaic copied to clipboard
Output column names for sparse output
The dataframe output of a model matrix has column names [Intercept, b[T.a],...]
when one specifies output = sparse
, the column identifiers are not available... ( the output is a regular scipy sparse matrix)
This would be needed when eg identifying coefficient names.
Hi! The column names are available on the (wrapped) sparse matrix using: <output>.model_spec.feature_names
. This isn't thoroughly documented mainly because I need to review the API, which I will do soon. I'll leave this here as a reminder to add documentation about this!
@matthewwardrop Hi, I would be keen to contribute to formulaic. Would this be a good first issue or is there something else you can suggest?
Hi @adamkells ! Thanks for your interest and willingness to contribute!
This particular issue has been resolved, though not yet documented; so not perhaps the best issue to contribute to.
Hmmm... Perhaps your best bet is to contribute a new transformation? Such work is orthogonal to framework improvements. Perhaps the missing cr
, ce
or te
basis transforms? I've been meaning to get around to implementing them, but haven't had much of a chance yet.
Hi @matthewwardrop, sounds good! I'll take a closer look at the code and open a PR later to discuss this further.