monero
monero copied to clipboard
use boost::container::flat_map instead of std::unordered_map for suba…
…ddresses
This cuts memory usage by about half, insertion is about 50% faster, and lookup, while about twice as slow, is dwarfed by the accompanying derive_subaddress_public_key anyway.
80 bytes -> 40 bytes per element. The default setup is 10k elements. Some people are using loads of subaddresses indeed. A new set of 50 per customer. A million customers -> 50e6 addresses -> 2 GB saved. Lookup is dwarfed by the crypto code that creates the key. By hashmap, do you mean std:map (as opposed to operator< based unordered_map) ?
80 bytes -> 40 bytes per element.
std::unordered_map
should need two more pointers per node, but it does have the "table" with varying load factor. Did you calculate this empirically?
The default setup is 10k elements.
That seems fairly large for the default number of subaddresses. I don't remember such a large value being selected. Was this is in a discussion on Github somewhere?
Lookup is dwarfed by the crypto code that creates the key
This is unhelpful without discussing the time or at least percentage of time added.
By hashmap, do you mean std:map (as opposed to operator< based unordered_map) ?
std::unordered_map
is a hashmap that uses a linked list to handle collisions. Technically neither is required by the standard, but the complexity and pointer lifetime guarantees make no other implementation possible.
Yes, empirically. This is the test code. Use with: ./a.out map|flat_map|unordered_map https://paste.debian.net/hidden/9a5f26e0/ I looked at VmData/VmRSS. I don't know whether the precalc numbers were discussed.
I reverted the sequence type change, it does not exist in some recent boosts (1.64 or 1.62 or so).
A few data points, with a wallet with 5k accounts with 5k subaddresses each, no outputs:
master: load: 27 seconds save: 12 seconds peak resident usage on load: 4.9 GB after load resident usage: 1.9 GB
with patch, loading from new format: load: 15 seconds save: 11 seconds peak resident usage on load: 4.2 GB after load resident usage: 980 MB
peak RSS is approximate, it peaks fast and I use top with fast refresh to spot the peak.
benchmarks for generating 300 accounts with 5k subaddresses each: gen_address function here https://gist.github.com/plowsof/97beade04012780bb8d89266455af99b
master + #8730 gen_addresses: 241m32.365s save: 0m0.883s close; 0m0.512s open: 0m5.448s
master + #8730 + #5370 gen_addresses: 514m24.396s save: 0m0.732s close: 0m0.023s open: 0m3.958s
i notice a speedup with opening wallet, but generating addresses is a lot slower
I pushed an extra speedup for adding more than once subaddress at once.
(which can be used with master also, it doesn't depend on the flat map patch)
@moneromooo-monero Nice one !!!!!!!!!?????? thanks
same test as above using gen_addresses
master + #8730 + wallet_rpc_server: speed up adding many subaddresses at once gen_addresses: 8m59.091s save: 0m0.823s close: 0m0.346s open: 0m5.821s
master + #8730 + #5370 gen_addresses: 8m59.260s save: 0m0.833s close: 0m0.353s open: 0m3.124s
And about the old comment above about performance:
./build/Linux/master/release/tests/performance_tests/performance_tests --filter test_is_out_to_acc_precomp --loop-multiplier 1000 --stats
master: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 19 µs, 90th 24 µs, median 19 µs, std dev 15 µs) with patch: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 24 µs, median 19 µs, std dev 15 µs)
This test runs by a single subaddress though, so with this patch to add 25 million subaddresses on master:
diff --git a/tests/performance_tests/is_out_to_acc.h b/tests/performance_tests/is_out_to_acc.h
index 75d4a5c46..e5ab32779 100644
--- a/tests/performance_tests/is_out_to_acc.h
+++ b/tests/performance_tests/is_out_to_acc.h
@@ -59,13 +59,18 @@ public:
if (!single_tx_test_base::init())
return false;
crypto::generate_key_derivation(m_tx_pub_key, m_bob.get_keys().m_view_secret_key, m_derivation);
+ subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
+ crypto::hash hash = crypto::null_hash;
+ for (int i = 0; i < 5000 * 5000; ++i)
+ {
+ cn_fast_hash(&hash, sizeof(hash), hash);
+ subaddresses[(const crypto::public_key&)hash] = = {0, i};
+ }
return true;
}
bool test()
{
const cryptonote::txout_to_key& tx_out = boost::get<cryptonote::txout_to_key>(m_tx.vout[0].target);
- std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
- subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
std::vector<crypto::key_derivation> additional_derivations;
boost::optional<cryptonote::subaddress_receive_info> info = cryptonote::is_out_to_acc_precomp(subaddresses, tx_out.key, m_derivation, additional_derivations, 0, hw::get_device("default"));
return (bool)info;
@@ -73,4 +78,5 @@ public:
private:
crypto::key_derivation m_derivation;
+ std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
};
and this patch for the flat_map version:
diff --cc tests/performance_tests/is_out_to_acc.h
index eac8d60d7,1942a1eb0..000000000
--- a/tests/performance_tests/is_out_to_acc.h
+++ b/tests/performance_tests/is_out_to_acc.h
@@@ -59,6 -59,13 +59,18 @@@ public
if (!single_tx_test_base::init())
return false;
crypto::generate_key_derivation(m_tx_pub_key, m_bob.get_keys().m_view_secret_key, m_derivation);
- subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
++ using spair_t = std::pair<crypto::public_key, cryptonote::subaddress_index>;
++ std::vector<std::pair<crypto::public_key, cryptonote::subaddress_index>> new_elements;
++ new_elements.push_back(std::make_pair(m_bob.get_keys().m_account_address.m_spend_public_key, cryptonote::subaddress_index{0,0}));
+ crypto::hash hash = crypto::null_hash;
+ for (int i = 0; i < 5000 * 5000; ++i)
+ {
+ cn_fast_hash(&hash, sizeof(hash), hash);
- subaddresses[(const crypto::public_key&)hash] = {0, i};
++ new_elements.push_back(std::make_pair((const crypto::public_key&)hash, cryptonote::subaddress_index{0, i}));
+ }
++ std::sort(new_elements.begin(), new_elements.end(), [](const spair_t &e0, const spair_t &e1) { return e0.first < e1.first; });
++ boost::container::ordered_unique_range_t tag;
++ subaddresses.insert(tag, new_elements.begin(), new_elements.end());
return true;
}
bool test()
@@@ -73,4 -78,5 +83,5 @@@
private:
crypto::key_derivation m_derivation;
- std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
++ boost::container::flat_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
};
for the patch:
master: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 24 µs, median 19 µs, std dev 14 µs) with patch: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 23 µs, median 19 µs, std dev 13 µs)
Rebased with a fix for the mac version. These commits can be PRed piecemeal if necessary.
From what I am seeing in the code, and @vtnerd comments, I believe this one is ready to approve. I am not approving it yet since I have to run our tests on it to be absolutely sure. But this PR I believe is long overdue and ready to be accepted.