monero icon indicating copy to clipboard operation
monero copied to clipboard

use boost::container::flat_map instead of std::unordered_map for suba…

Open moneromooo-monero opened this issue 5 years ago • 12 comments

…ddresses

This cuts memory usage by about half, insertion is about 50% faster, and lookup, while about twice as slow, is dwarfed by the accompanying derive_subaddress_public_key anyway.

moneromooo-monero avatar Mar 29 '19 20:03 moneromooo-monero

80 bytes -> 40 bytes per element. The default setup is 10k elements. Some people are using loads of subaddresses indeed. A new set of 50 per customer. A million customers -> 50e6 addresses -> 2 GB saved. Lookup is dwarfed by the crypto code that creates the key. By hashmap, do you mean std:map (as opposed to operator< based unordered_map) ?

moneromooo-monero avatar Mar 30 '19 00:03 moneromooo-monero

80 bytes -> 40 bytes per element.

std::unordered_map should need two more pointers per node, but it does have the "table" with varying load factor. Did you calculate this empirically?

The default setup is 10k elements.

That seems fairly large for the default number of subaddresses. I don't remember such a large value being selected. Was this is in a discussion on Github somewhere?

Lookup is dwarfed by the crypto code that creates the key

This is unhelpful without discussing the time or at least percentage of time added.

By hashmap, do you mean std:map (as opposed to operator< based unordered_map) ?

std::unordered_map is a hashmap that uses a linked list to handle collisions. Technically neither is required by the standard, but the complexity and pointer lifetime guarantees make no other implementation possible.

vtnerd avatar Mar 30 '19 03:03 vtnerd

Yes, empirically. This is the test code. Use with: ./a.out map|flat_map|unordered_map https://paste.debian.net/hidden/9a5f26e0/ I looked at VmData/VmRSS. I don't know whether the precalc numbers were discussed.

moneromooo-monero avatar Mar 30 '19 09:03 moneromooo-monero

I reverted the sequence type change, it does not exist in some recent boosts (1.64 or 1.62 or so).

moneromooo-monero avatar May 05 '19 08:05 moneromooo-monero

A few data points, with a wallet with 5k accounts with 5k subaddresses each, no outputs:

master: load: 27 seconds save: 12 seconds peak resident usage on load: 4.9 GB after load resident usage: 1.9 GB

with patch, loading from new format: load: 15 seconds save: 11 seconds peak resident usage on load: 4.2 GB after load resident usage: 980 MB

peak RSS is approximate, it peaks fast and I use top with fast refresh to spot the peak.

moneromooo-monero avatar Feb 15 '23 22:02 moneromooo-monero

benchmarks for generating 300 accounts with 5k subaddresses each: gen_address function here https://gist.github.com/plowsof/97beade04012780bb8d89266455af99b

master + #8730 gen_addresses: 241m32.365s save: 0m0.883s close; 0m0.512s open: 0m5.448s

master + #8730 + #5370 gen_addresses: 514m24.396s save: 0m0.732s close: 0m0.023s open: 0m3.958s

i notice a speedup with opening wallet, but generating addresses is a lot slower

plowsof avatar Feb 18 '23 06:02 plowsof

I pushed an extra speedup for adding more than once subaddress at once.

moneromooo-monero avatar Feb 19 '23 12:02 moneromooo-monero

(which can be used with master also, it doesn't depend on the flat map patch)

moneromooo-monero avatar Feb 19 '23 12:02 moneromooo-monero

@moneromooo-monero Nice one !!!!!!!!!?????? thanks

same test as above using gen_addresses

master + #8730 + wallet_rpc_server: speed up adding many subaddresses at once gen_addresses: 8m59.091s save: 0m0.823s close: 0m0.346s open: 0m5.821s

master + #8730 + #5370 gen_addresses: 8m59.260s save: 0m0.833s close: 0m0.353s open: 0m3.124s

plowsof avatar Feb 20 '23 00:02 plowsof

And about the old comment above about performance:

./build/Linux/master/release/tests/performance_tests/performance_tests --filter test_is_out_to_acc_precomp --loop-multiplier 1000 --stats

master: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 19 µs, 90th 24 µs, median 19 µs, std dev 15 µs) with patch: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 24 µs, median 19 µs, std dev 15 µs)

This test runs by a single subaddress though, so with this patch to add 25 million subaddresses on master:

diff --git a/tests/performance_tests/is_out_to_acc.h b/tests/performance_tests/is_out_to_acc.h
index 75d4a5c46..e5ab32779 100644
--- a/tests/performance_tests/is_out_to_acc.h
+++ b/tests/performance_tests/is_out_to_acc.h
@@ -59,13 +59,18 @@ public:
     if (!single_tx_test_base::init())
       return false;
     crypto::generate_key_derivation(m_tx_pub_key, m_bob.get_keys().m_view_secret_key, m_derivation);
+    subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
+    crypto::hash hash = crypto::null_hash;
+    for (int i = 0; i < 5000 * 5000; ++i)
+    {
+      cn_fast_hash(&hash, sizeof(hash), hash);
+      subaddresses[(const crypto::public_key&)hash] = = {0, i};
+    }
     return true;
   }
   bool test()
   {
     const cryptonote::txout_to_key& tx_out = boost::get<cryptonote::txout_to_key>(m_tx.vout[0].target);
-    std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
-    subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
     std::vector<crypto::key_derivation> additional_derivations;
     boost::optional<cryptonote::subaddress_receive_info> info = cryptonote::is_out_to_acc_precomp(subaddresses, tx_out.key, m_derivation, additional_derivations, 0, hw::get_device("default"));
     return (bool)info;
@@ -73,4 +78,5 @@ public:

 private:
   crypto::key_derivation m_derivation;
+    std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
 };

and this patch for the flat_map version:

diff --cc tests/performance_tests/is_out_to_acc.h
index eac8d60d7,1942a1eb0..000000000
--- a/tests/performance_tests/is_out_to_acc.h
+++ b/tests/performance_tests/is_out_to_acc.h
@@@ -59,6 -59,13 +59,18 @@@ public
      if (!single_tx_test_base::init())
        return false;
      crypto::generate_key_derivation(m_tx_pub_key, m_bob.get_keys().m_view_secret_key, m_derivation);
 -    subaddresses[m_bob.get_keys().m_account_address.m_spend_public_key] = {0,0};
++    using spair_t = std::pair<crypto::public_key, cryptonote::subaddress_index>;
++    std::vector<std::pair<crypto::public_key, cryptonote::subaddress_index>> new_elements;
++    new_elements.push_back(std::make_pair(m_bob.get_keys().m_account_address.m_spend_public_key, cryptonote::subaddress_index{0,0}));
+     crypto::hash hash = crypto::null_hash;
+     for (int i = 0; i < 5000 * 5000; ++i)
+     {
+       cn_fast_hash(&hash, sizeof(hash), hash);
 -      subaddresses[(const crypto::public_key&)hash] = {0, i};
++      new_elements.push_back(std::make_pair((const crypto::public_key&)hash, cryptonote::subaddress_index{0, i}));
+     }
++    std::sort(new_elements.begin(), new_elements.end(), [](const spair_t &e0, const spair_t &e1) { return e0.first < e1.first; });
++    boost::container::ordered_unique_range_t tag;
++    subaddresses.insert(tag, new_elements.begin(), new_elements.end());
      return true;
    }
    bool test()
@@@ -73,4 -78,5 +83,5 @@@
  
  private:
    crypto::key_derivation m_derivation;
 -    std::unordered_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
++    boost::container::flat_map<crypto::public_key, cryptonote::subaddress_index> subaddresses;
  };

for the patch:

master: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 24 µs, median 19 µs, std dev 14 µs) with patch: test_is_out_to_acc_precomp (1000000 calls) - OK: 21 µs/call (min 18 µs, 90th 23 µs, median 19 µs, std dev 13 µs)

moneromooo-monero avatar Feb 20 '23 17:02 moneromooo-monero

Rebased with a fix for the mac version. These commits can be PRed piecemeal if necessary.

moneromooo-monero avatar Feb 21 '23 13:02 moneromooo-monero

From what I am seeing in the code, and @vtnerd comments, I believe this one is ready to approve. I am not approving it yet since I have to run our tests on it to be absolutely sure. But this PR I believe is long overdue and ready to be accepted.

0xFFFC0000 avatar Dec 21 '23 14:12 0xFFFC0000