adiar icon indicating copy to clipboard operation
adiar copied to clipboard

Allow for larger block sizes

Open SSoelvsten opened this issue 3 years ago • 5 comments

With #144 we hope to clean up and move all common TPIE memory computations into a single place. It would be useful if in memory.h the derived minimum amount of memory needed by tpie::merge_sorter and tpie::priority_queue depends on the block size; currently these computations have magic constants that only work for B = 2 MiB.

We may even automatically initialise TPIE and Adiar with a larger block size that is linearly dependant on the amount of memory given to adiar::adiar_init(...). This would hopefully improve the performance of Adiar when given lots of memory, since it does fewer (but larger) I/Os.

Readying the external_sorter for different block sizes

  • [x] Do experiments (in main.cpp) where we

    1. Set the block size to a power of two less than or equal to 64.
    2. Instantiate a tpie::merge_sorter<T>
    3. Set the memory for the merge sorter, where all three phases are specified. Find the smallest value for phase 1 (the first value) such that the merge sorter does not crash.

    Independent variables in this experiment: block size, size_of(T).

    Post the results in a table below.

  • [ ] Use the static function(s) inside of the constructor.

Set the block size in internal/memory.cpp

  • [ ] No algorithm splits the space in more than 1/8th of the entire space. Based on the tall cache assumption typically used for cache-oblivious algorithms, we always would want B2 ≤ M. So, I'd propose one should derive the largest power-of-two that is smaller than sqrt(M/8).

    That is, we should improve on the recommended_block_size to compute the max block size for a given amount of memory.

    #include <cmath>
    
    namespace adiar { namespace memory
    {
      size_t recommended_block_size(size_t memory_limit_bytes)
      { 
        const size_t block_upper_bound = static_cast<size_t>(sqrt(memory_limit_bytes / 8));
        size_t block_size = 1 << 21; // 2 MiB
        while ((block_size << 1) < block_upper_bound) {
           block_size <<= 1;
        }
        return block_size;
      }
    }
    

    Some experiments should be done to check this truly does not break anything, but most likely that can be solved with a "divide by 16", I believe.

SSoelvsten avatar Oct 23 '21 16:10 SSoelvsten

The main issue is fixed in #128 already by having the adiar::external sorter do its computations based on the tpie::merge_sorter's minimum requirements (which it presumably bases off of the block size).

SSoelvsten avatar Dec 06 '21 07:12 SSoelvsten

SSD's are complicated. Their performance is optimal when using a specific block size that matches the one used internally in the SSD logic. This value varies from every SSD disk to SSD disk, but they seem typically to be somewhere between 8 MiB and 32 MiB.

  • It hurts much less to overshoot the SSD block size than to undershoot it. For example, my laptop has the following read speeds as per the built in disk benchmarking of Ubuntu:

    Block size (MiB) 2 4 8 16 24 32 42 64 128
    Transfer rate (GiB/s) 0.7 1.1 1.3 1.6 1.5 1.7 1.5 1.6 1.2

    So, we may want to:

    1. Upper bound the above derivation of a block size. 32 MiB or 64 MiB seems reasonable.
    2. Let the end-user explicitly override our choice of block size or provide a "suggestion".
  • Apparently, based on an assignment by Peyman Afshani in Computational Geometry one can for simplicity assume that:

    B > logM / B (N)

    From this, one may be able to derive a block size?

SSoelvsten avatar Dec 09 '21 10:12 SSoelvsten

It should be noted that #128 in fact does not do the very hard part on precomputing the minimal size of each sorter. The functions tpie::merge_sorter.minimum_memory_phaseX() do not seem to work as desired (see https://github.com/thomasmoelhave/tpie/issues/250 and https://github.com/thomasmoelhave/tpie/issues/206).

So, we'll need to reverse engineer a linear function on the phase 1 size based on the block size anyway.

SSoelvsten avatar Dec 27 '21 16:12 SSoelvsten

The minimum amount of memory is as follows:

memory_size_type min_m1 = 128*1024 / m_item_size + bits::run_positions::memory_usage() + streamMemory + tempFileMemory;

Where we have

  • m_item_size = specific_store_t::item_size
    
    In our case (as far as I know) boils down to sizeof(T).
  • bits::run_positions::memory_usage() {
      return sizeof(run_positions)
      	+ 2 * file_stream<stream_position>::memory_usage();
    }
    
    this we should be able to either copy or to call directly ourselves.
  • memory_size_type streamMemory = m_element_file_stream_memory_usage;
    
    which is initialised as file_stream<element_type>::memory_usage()
  • memory_size_type tempFileMemory = 2*p.fanout*sizeof(temp_file);
    

SSoelvsten avatar May 23 '22 09:05 SSoelvsten

Based on the above, this works. Notice, that the above assumption of just using sizeof(T) is not correct.

    const size_t block_size = 32 * 1024 * 1024;
    tpie::set_block_size(block_size);

    // ===== Your code starts here =====
    const size_t mem = tpie::get_memory_manager().available();

    const size_t item_size = tpie::default_store::template specific<tpie::default_store::template element_type<int>>::item_size;
    const size_t run_positions_mem_usage = tpie::bits::run_positions::memory_usage();
    const size_t streamMemory = tpie::file_stream<int>::memory_usage();
    constexpr size_t min_p1_fanout = 2;
    const size_t tempFileMemory = 2*min_p1_fanout*sizeof(tpie::temp_file);

    const size_t p1_mem = 128*1024 / item_size + run_positions_mem_usage + streamMemory + tempFileMemory;

    tpie::merge_sorter<int, false, std::less<>> ms;
    ms.set_available_memory(p1_mem, mem, mem);
    ms.begin();

SSoelvsten avatar May 23 '22 09:05 SSoelvsten