seismometer
seismometer copied to clipboard
DRAFT: add other_value and top_k transform for cohorts
Overview
An implementation of a cohort transformation that allows creation of an "Other" placeholder value for cohort columns that might have many options, but where we expect a long tail of small counts that can be meaningfully grouped together as "Other", or ignored by marking as np.nan or None.
Description of changes
Adds a cohort transform to allow renaming small count columns to an other_value group.
cohorts:
- source: many_values_column
display_name: All Different Values
- source: many_values_column
display_name: Top 5 or Other
top_k: 5
other_value: "Other"
Author Checklist
- [ ] Linting passes; run early with pre-commit hook.
- [ ] Tests added for new code and issue being fixed.
- [ ] Added type annotations and full numpy-style docstrings for new methods.
- [ ] Draft your news fragment in new
changelog/ISSUE.TYPE.rstfiles; see changelog/README.md.