dpcreator
dpcreator copied to clipboard
Add stability histogram
If the categories are unknown/blank, or the user choose to, use the stability histogram.
This needs some UI updates.
- e.g. We currently require categories.
- Do we want the option to use the stability histogram to happen opaquely?
- Should there be a question/info at some point (e.g. under Create Statistic pop-up) that gives a choice to:
- (1) Use the categories from the "Confirm Variables" step
- (2) Add a statement/question - "I don't know enough about the categories, etc" -- which then opts to use the stability histogram
from @Shoeboxam:
the stability histogram is great in situations where the category set is intractably large. For example, url visitation frequencies... you could argue that the set of categories is the set of all unique urls! In the absence of url limits, the cardinality of the category set is infinite! You don't have to specify this infinite set up-front to use the stability histogram, which makes it possible to calculate on a computer. Realistically, you could still only specify a finite number of the more likely urls, but that's too much effort to track down