oml
oml copied to clipboard
histogram normalization
for float-bin histograms: one could normalize 1) not at all - then int is a natural bin count type 2) sum to 1 3) integrate to 1, i.e. bin width * hist_value sums up to 1.
(for 2 the counts would be rationals (but that seems silly) and for 3 obviously only float counts make sense)
is it worthwhile to include the choice of normalization as an option? i guess 1) with int counts would need an extra function for the different return type; alternatively one could think of a variant return type...
Oh, now that I've also read this issue, yes! Those are all good ideas. Perhaps a new module in the style of Accu would be useful, at that point the "groupby" or "reducing" function in accu
could perform this normalization?
do you mean the 'increment
function in the type? i don't see yet how that would work; when updating one bin and maintaining the normalization at every update, all other bins would have to be updated at every step -- sounds inefficient and incompatible with the type.
i was thinking it may be better to maintain an extra running sum in the data structure and only normalize lazily when the actual array of counts/weights is requested. i.e. one would have to keep an int
array for the counts, a running int
sum, and i guess also the bin array in the case of integral normalization.
anyway. would a putative new accu-like module then be the foundation also for the float histogram? this would be the design in biocaml. would that also work for 2d float histograms?
on second thought maybe a special histogram data structure is too complicated. i guess that would only be needed if one wants to interrupt and later restart histogram accumulation.
Yes, you're right, it would need another parameter/transformation at the end. This is part of why I am hesitant to think of these operations under the general capabilities of a histogram, but as more of a selecting/grouping/aggregating table-like data structure.