histogram icon indicating copy to clipboard operation
histogram copied to clipboard

Support multiple weights

Open HDembinski opened this issue 4 years ago • 7 comments

The current setup allows several samples to be passed to the histogram, but only one weight. Some users want to track several weights simultaneously, see

https://github.com/scikit-hep/boost-histogram/issues/83

This can be achieved with a new multi_weight type and some minor changes to the existing accumulators.

HDembinski avatar Sep 11 '19 11:09 HDembinski

Hi @HDembinski ,

Can you share an example where multiple samples are passed in the histogram? I tried the following based on the code examples on using profiles, but it does not seem to work (my knowledge in boost is limited).

    auto p = make_profile(axis::regular<>(5, 0.0, 1.0));

    /*
      Fill profile with data, usually this happens in a loop. You pass the sample
      with the `sample` helper function. The sample can be the first or last
      argument.
    */

    p(0.1, sample(1., 10.));
    p(0.15, sample(3., 30.));
    p(0.2, sample(4., 40.));
    p(0.9, sample(5., 50.));

Do I need to explicitly specify the boost accumulator to be used in the histogram in such a case?

zouzias avatar Nov 05 '19 19:11 zouzias

Hi, it doesn't work automatically with make_profile, you need to have an accumulator that handles more than one sample. Making accumulators is very simple, though. For a working example, scroll down to the end of in the user guide, https://www.boost.org/doc/libs/develop/libs/histogram/doc/html/histogram/guide.html#histogram.guide.expert.user_defined_accumulators or look at the example directly: https://github.com/boostorg/histogram/blob/develop/examples/guide_custom_accumulators_3.cpp

HDembinski avatar Nov 05 '19 20:11 HDembinski

Note: as of today, the docs page is not up-to-date yet. The example I am referring to was just updated yesterday to show how to make accumulators that accept weights. Depending on when you look at the docs, you may see the up-to-date example that I pointed to in my second link or the older version of that example.

HDembinski avatar Nov 05 '19 20:11 HDembinski

Great, many thanks @HDembinski for sharing the links.

zouzias avatar Nov 06 '19 08:11 zouzias

Some thoughts from a Zoom dicussion with Kevin Lannon, Eddiy McGrady, Henry Schreiner and Nick Smith.

Requirements:

  • The number of weights must be settable at run-time, not only at compile-time. Otherwise this feature cannot be used from Python.
  • We want the implementation to be efficient in runtime usage and memory usage.

Possible implementations in C++. Let's say we have a histogram with N cells and want to store K weights per cell. In the following I assume we want to store weights are doubles and we are on a platform where the size of a double is equal to the size of a pointer.

  1. Each accumulator holds a std::vector. This is simple to implement but not be optimal. The std::vector allows one to have a different number of weights per cell, although all the weights are the same. This design wastes memory since the length of the vector is stored per cell, although it only needs to be stored once per storage.

This design requires at least the size for N x (K + 3) doubles in total . It is K + 3, because a std::vector at the very least contains a pointer, a size, and a capacity. The size and capacity are unsigned integers typically equal to the size of a pointer. Some implementations of std::vector use much more space than that though, for example, the std::vector of the Windows stdlib.

  1. Each accumulator holds a pointer to an array of dynamic size. The size of this array is not stored in the accumulator, but only once in the storage. This requires a special storage and an interplay between accumulator and storage that is currently not implemented in any of the existing accumulators and storages. In comparison to the existing accumulators, this design introduces an extra indirection every time a bin is accessed, which could be a performance penalty if the array is short. If the array is long but the number of cells is not, this could also increase performance, since the lookup of the pointer is very fast (the storage is effectively a pointer lookup table that can be stored in the L1 cache). The indirection would most likely be a cache miss, though, given the random-access nature of filling a histogram. For sufficiently small N and K, both the pointer table and the arrays will fit into the faster CPU caches, so that the cache miss is avoided. However, due to the fixed size of a cache line, this design is still inefficient unless the arrays are somehow contiguous in memory. In general they won't be.

This design requires the size for N x (K + 1) + 1 doubles in total.

  1. We make a special storage for this case which has shape (N, K), where N is the number of cells and K is the number weights. Memory is contiguous. The accumulator in this case would be a view into the memory allocated by the storage. This may be faster if K x K is small enough for the whole histogram to fit into the faster CPU caches. If N x K is large, the histogram will not fit into the caches and a bin-lookup will most likely be a cache miss. The performance penalty of this cache miss should be the main driving cost then, making this approximately cost as much as 2).

This design requires the size for N x K + 1 doubles in total, which is the lowest possible given the requirements. When N x K is large, 2) and 3) should be equally performant, but 3) should be faster if N x K is sufficiently small so that the whole histogram fits into the faster CPU caches.

In summary, option 3) should be the overall fastest and is the most memory efficient.

HDembinski avatar Sep 14 '22 14:09 HDembinski

Working prototype of option 3:

https://godbolt.org/z/x87h5MbEG

#include <boost/histogram.hpp>
#include <boost/histogram/detail/iterator_adaptor.hpp>
#include <boost/core/span.hpp>
#include <memory>
#include <algorithm>
#include <iostream>

namespace boost {
namespace histogram {

template <class T>
struct multi_weight_value : public boost::span<T> {
    using boost::span<T>::span;

    void operator()(boost::span<T> values) {
        if (values.size() != this->size())
            throw std::runtime_error("size does not match");

        auto it = this->begin();
        for (const T& x : values)
            *it++ += x;
    }
};


template <class ElementType = double>
class multi_weight {
public:
    using element_type = ElementType;
    using value_type = multi_weight_value<element_type>;
    using reference = value_type;
    using const_reference = const value_type;

    template <class T>
    struct iterator_base : public detail::iterator_adaptor<iterator_base<T>, std::size_t, T&> {
        using base_type =  detail::iterator_adaptor<iterator_base<T>, std::size_t, T&>;

        iterator_base(multi_weight* par, std::size_t idx) : base_type{idx}, par_{par} {}
        iterator_base(iterator_base&& other) = default;
        iterator_base(const iterator_base& other) = default;

        decltype(auto) operator*() { return T{par_->buffer_.get() + this->base() * par_->nelem_, par_->nelem_}; }

        multi_weight* par_;
    };

    using iterator = iterator_base<value_type>;
    using const_iterator = iterator_base<const value_type>;

    static constexpr bool has_threading_support() { return false; }

    multi_weight(const std::size_t k) : nelem_{k} {}

    std::size_t size() const { return size_; }

    void reset(std::size_t n) {
        size_ = n;
        buffer_.reset(new element_type[n * nelem_]);
    }

    iterator begin() { return {this, 0}; }
    iterator end() { return {this, size_}; }

    const_iterator begin() const { return {this, 0}; }
    const_iterator end() const { return {this, size_}; }

    reference operator[](std::size_t i) { return reference{buffer_.get() + i * nelem_, nelem_}; }
    const_reference operator[](std::size_t i) const { return const_reference{buffer_.get() + i * nelem_, nelem_}; }

    template <class T>
    bool operator==(const multi_weight<T>& other) const {
        if (size() != other.size())
            return false;
        return std::equal(buffer_._get(), buffer_.get() + size_ * nelem_, other.ptr_.get());
    }

public:
    std::size_t size_ = 0;
    std::size_t nelem_;
    std::unique_ptr<element_type[]> buffer_;
};

template <class T>
std::ostream& operator<<(std::ostream& os, const multi_weight_value<T>& v) {
    os << "multi_weight_value(";
    bool first = true;
    for (const T& x : v)
        if (first) { first = false; os << x; }
        else os << ", " << x;
    os << ")";
    return os;
}

}
}

int main() {
    using namespace boost::histogram;

    using MW = multi_weight<double>;
    MW mw(2);
    mw.reset(2);
    for (int i = 0; i < 4; ++i) 
        *(mw.buffer_.get() + i) = 1 + i;

    for (auto it = mw.begin(); it != mw.end(); ++it)
        std::cout << std::distance(mw.begin(), it) << " " << *it << std::endl;

    auto h = make_histogram_with(multi_weight<double>(2), axis::regular(5, 0, 1));

    std::vector<double> v = {{1, 2}};

    h(0.1, sample(boost::span<double>(v)));
}

There is still an issue in the core library, because h(0.1, sample(v)); should also work, but currently I have to use h(0.1, sample(boost::span<double>(v)));

HDembinski avatar Sep 16 '22 15:09 HDembinski

multi_weight should probably be renamed to multi_storage. It should work with the existing accumulators.

HDembinski avatar Sep 19 '22 14:09 HDembinski