adiar
adiar copied to clipboard
Allow for larger block sizes
With #144 we hope to clean up and move all common TPIE memory computations into a single place. It would be useful if in memory.h the derived minimum amount of memory needed by tpie::merge_sorter
and tpie::priority_queue
depends on the block size; currently these computations have magic constants that only work for B = 2 MiB.
We may even automatically initialise TPIE and Adiar with a larger block size that is linearly dependant on the amount of memory given to adiar::adiar_init(...)
. This would hopefully improve the performance of Adiar when given lots of memory, since it does fewer (but larger) I/Os.
Readying the external_sorter
for different block sizes
-
[x] Do experiments (in main.cpp) where we
- Set the block size to a power of two less than or equal to 64.
- Instantiate a
tpie::merge_sorter<T>
- Set the memory for the merge sorter, where all three phases are specified. Find the smallest value for phase 1 (the first value) such that the merge sorter does not crash.
Independent variables in this experiment: block size, size_of(T).
Post the results in a table below.
-
[ ] Use the static function(s) inside of the constructor.
Set the block size in internal/memory.cpp
-
[ ] No algorithm splits the space in more than 1/8th of the entire space. Based on the tall cache assumption typically used for cache-oblivious algorithms, we always would want B2 ≤ M. So, I'd propose one should derive the largest power-of-two that is smaller than sqrt(M/8).
That is, we should improve on the
recommended_block_size
to compute the max block size for a given amount of memory.#include <cmath> namespace adiar { namespace memory { size_t recommended_block_size(size_t memory_limit_bytes) { const size_t block_upper_bound = static_cast<size_t>(sqrt(memory_limit_bytes / 8)); size_t block_size = 1 << 21; // 2 MiB while ((block_size << 1) < block_upper_bound) { block_size <<= 1; } return block_size; } }
Some experiments should be done to check this truly does not break anything, but most likely that can be solved with a "divide by 16", I believe.
The main issue is fixed in #128 already by having the adiar::external sorter
do its computations based on the tpie::merge_sorter
's minimum requirements (which it presumably bases off of the block size).
SSD's are complicated. Their performance is optimal when using a specific block size that matches the one used internally in the SSD logic. This value varies from every SSD disk to SSD disk, but they seem typically to be somewhere between 8 MiB and 32 MiB.
-
It hurts much less to overshoot the SSD block size than to undershoot it. For example, my laptop has the following read speeds as per the built in disk benchmarking of Ubuntu:
Block size (MiB) 2 4 8 16 24 32 42 64 128 Transfer rate (GiB/s) 0.7 1.1 1.3 1.6 1.5 1.7 1.5 1.6 1.2 So, we may want to:
- Upper bound the above derivation of a block size. 32 MiB or 64 MiB seems reasonable.
- Let the end-user explicitly override our choice of block size or provide a "suggestion".
-
Apparently, based on an assignment by Peyman Afshani in Computational Geometry one can for simplicity assume that:
B > logM / B (N)
From this, one may be able to derive a block size?
It should be noted that #128 in fact does not do the very hard part on precomputing the minimal size of each sorter. The functions tpie::merge_sorter.minimum_memory_phaseX()
do not seem to work as desired (see https://github.com/thomasmoelhave/tpie/issues/250 and https://github.com/thomasmoelhave/tpie/issues/206).
So, we'll need to reverse engineer a linear function on the phase 1 size based on the block size anyway.
The minimum amount of memory is as follows:
memory_size_type min_m1 = 128*1024 / m_item_size + bits::run_positions::memory_usage() + streamMemory + tempFileMemory;
Where we have
-
In our case (as far as I know) boils down tom_item_size = specific_store_t::item_size
sizeof(T)
. -
this we should be able to either copy or to call directly ourselves.bits::run_positions::memory_usage() { return sizeof(run_positions) + 2 * file_stream<stream_position>::memory_usage(); }
-
which is initialised asmemory_size_type streamMemory = m_element_file_stream_memory_usage;
file_stream<element_type>::memory_usage()
-
memory_size_type tempFileMemory = 2*p.fanout*sizeof(temp_file);
Based on the above, this works. Notice, that the above assumption of just using sizeof(T)
is not correct.
const size_t block_size = 32 * 1024 * 1024;
tpie::set_block_size(block_size);
// ===== Your code starts here =====
const size_t mem = tpie::get_memory_manager().available();
const size_t item_size = tpie::default_store::template specific<tpie::default_store::template element_type<int>>::item_size;
const size_t run_positions_mem_usage = tpie::bits::run_positions::memory_usage();
const size_t streamMemory = tpie::file_stream<int>::memory_usage();
constexpr size_t min_p1_fanout = 2;
const size_t tempFileMemory = 2*min_p1_fanout*sizeof(tpie::temp_file);
const size_t p1_mem = 128*1024 / item_size + run_positions_mem_usage + streamMemory + tempFileMemory;
tpie::merge_sorter<int, false, std::less<>> ms;
ms.set_available_memory(p1_mem, mem, mem);
ms.begin();