fitter
fitter copied to clipboard
[Feature Request]: Compact set of most diverse distributions to try on a limited budget
Thanks for this cool lib! Facing a problem currently of finding the best compact set of distributions to try on data of unknown nature, given a limited time/CPU budget. As it appears, many of the distributions are subsets of each other, and result in a really twin-like behavior. When compute budget is limited, it probably has no sense to check distributions that can easily give similar shapes, it would be more reasonable to try the most diverse ones first (on average). Then, out of 2 distributions with similar avg diversity, it's better to start with the one having lower average runtime. get_common_distributions() seems to not account for diversity and avg runtime. Are you interested in research or PR resulting in a new function like get_efficient_distributions(n:int=3) that returns n most diverse and fast-to-compute distributions, on average?
@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.
@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.
Thanks. My research is conducted but not published yet. Did quite a lot of computations, ~ 24 hrs with 16 cores ) I'll try to prepare a publication in the coming month, but for now, quick result is that 3 most universal distributions that, taken together, can approximate well the highest number of other distributions and are reasonable fast to compute, are stats.levy_l, stats.logistic, stats.pareto.
You may extract more info from details.