Could broker::table be unordered_map instead?
Testing some scenarios in zeek/zeek#2394, _Rb_tree code is standing out in the profiles for publish_id of a table with 1mio entries. Maybe using std::unordered_map for broker::table could improve performance for such large tables?
https://github.com/zeek/broker/blob/5e64f93f9e52e1cc1b3612589e2d46eb460f58aa/include/broker/data.hh#L40-L41
No strong opinion on the change (doesn't matter for Broker itself), but changing this to unordered_map is a breaking API change. Whether there are actually any clients that rely on broker::table being sorted when iterating it: I don't know. Just pointing it out. 🙂
If we do this, It's probably also worth it to think about changing broker::set to an alias for std::unordered_set.
Would it be easy for you to change the types and generate new performance data? Having data on whether or not it's worth it to switch to the hash-based containers would provide some objective measure for the pro/con discussion.
Thanks for the comments.
Would it be easy for you to change the types and generate new performance data? Having data on whether or not it's worth it to switch to the hash-based containers would provide some objective measure for the pro/con discussion.
I think so. The test-case is "just" publishing a large table (1mio entries) table via publish_id on the Zeek side.
I tried the obvious change and it was complaining about missing std::hash templates for the involved types, so it's not just drop-in replacement.
@awelzel if you want to give it a spin: the branch issue/gh-276 has the necessary changes.
The branch unfortunately doesn't compile for me. Here's a test script though - it creates two tables each with 500k entries and publishes them to a connecting worker from the manager:
ZEEK_DEFAULT_CONNECT_RETRY=1 zeek -j ./publish_id.zeek
...
[supervisor:STDOUT] [manager] Populated, 500000, 500000
[supervisor:STDOUT] [manager] Publishing, worker-001
[supervisor:STDOUT] [manager] Publish done, worker-001, 2.101729
So this indicates we'd block for 2 seconds in script land before returning from the publish_id() call.
@load base/frameworks/cluster
@load base/frameworks/reporter
redef Broker::disable_ssl = T;
redef Reporter::info_to_stderr = T;
redef Reporter::warnings_to_stderr = T;
redef Reporter::errors_to_stderr = T;
global interface = "fuzzout";
global workers = 1;
event zeek_init()
{
if ( ! Supervisor::is_supervisor() )
return;
Broker::listen("127.0.0.1", 9999/tcp);
local cluster: table[string] of Supervisor::ClusterEndpoint;
cluster["manager"] = [$role=Supervisor::MANAGER, $host=127.0.0.1, $p=10000/tcp];
local i = 0;
local worker_port_offset = 10010;
while ( ++i <= workers )
{
local name = fmt("worker-%03d", i);
local p = count_to_port(worker_port_offset + i, tcp);
cluster[name] = [$role=Supervisor::WORKER, $host=127.0.0.1, $p=p, $interface=interface];
}
for ( n, ep in cluster )
{
local sn = Supervisor::NodeConfig($name=n);
sn$cluster = cluster;
sn$directory = n;
if ( ep?$interface )
sn$interface = ep$interface;
local res = Supervisor::create(sn);
if ( res != "" )
Reporter::fatal(fmt("supervisor failed to create node '%s': %s", n, res));
}
}
module PublishTest;
export {
type Store: record {
addr_table: table[addr] of string &default=table();
string_table: table[string] of string &default=table();
};
global store: Store;
}
@if ( Cluster::local_node_type() == Cluster::MANAGER )
event zeek_init()
{
local i = 0;
local lim = 500000;
while (++i <= lim)
{
local a = counts_to_addr(vector(50000000 + i));
store$addr_table[a] = cat(a);
local d = fmt("domain-%d.example.com", i);
store$string_table[d] = d;
}
print "Populated", |store$addr_table|, |store$string_table|;
}
event Cluster::node_up(name: string, id: string)
{
print "Publishing", name;
local st = current_time();
Broker::publish_id(Cluster::node_topic(name), "PublishTest::store");
local et = current_time();
print "Publish done", name, interval_to_double(et - st);
}
@endif
The branch unfortunately doesn't compile for me.
I've only tested with Clang, sorry for not cross-checking the patch with other vendors! After reproducing this on GCC, it looks like the GCC implementation of the unordered containers does not work with forward-declared types. Seems like the standard only requires smart pointers, vector, list, etc. to work with forward-declared types. That actually rules out using the hash-based containers in recursive types like broker::data altogether.
The only way to get this working with GCC's version of the standard library would be forcing heap indirections at some point. Pointers always have the same size, so the compiler can create a memory layout without needing the size of the pointed-to-type.
There are basically two options to do that in Broker:
- make
broker::tablean alias tounordered_map<unique_ptr<data>, unique_ptr<data>>; or - change
broker::data_variantto contain aunique_ptr<table>.
The latter is probably the way to go (less intrusive and fewer memory allocations), but at that point we're no longer talking about a drop-in replacement and are no longer "just" changing the iteration order but actually break the API.
Should I go ahead and make the changes on the branch for testing purposes or is that a deal breaker right away?
Should I go ahead and make the changes on the branch for testing purposes or is that a deal breaker right away?
No real opinion. I don't have hunch how much faster it could be - so would still be interesting to test it if the implementation isn't too involved (assuming breaking the API is okay to just do the comparison).
If it's vastly better, might be worth considering it.
At the same time, while we have that one user of publish_id() that may end up sending tables that have 500k or more entries, performance expectations of such an operation maybe shouldn't be too high and alternative approaches found.
A couple of thoughts here:
- I agree that knowing the potential speedup would be good, for the reasons Arne flagged, but it's also not clear that this alone will get to the gist of zeek/zeek#2394,
- I am rather against making any changes to Broker that aren't specifically needed bugfixes/improvements since Broker seems too "unsettled" at the moment when we try to get releases out,
- Anything you can do @Neverlord to help re. the logger lockup and @sethhall's recent findings in that space far out-prioritizes this one
Therefore, unless you think you'll get to this one in the near future Dominik, I'd close this and possibly revisit at a later time.
No real opinion. I don't have hunch how much faster it could be - so would still be interesting to test it if the implementation isn't too involved (assuming breaking the API is okay to just do the comparison).
If it's just for getting a first impression: can you build that branch using clang instead of using GCC? If it's a big performance gain, we can still do the API changes to make GCC happy later. 🙂