aho_corasick icon indicating copy to clipboard operation
aho_corasick copied to clipboard

Fixed data race in failure state initialization

Open genbattle opened this issue 5 years ago • 5 comments

A data race was created when multiple threads entered parse_text at the same time and failure state construction was triggered. The first thread would enter the failure state construction function and begin construction, setting a completion flag before initialization was actually complete. Other threads would see the flag in its set state and skip over the lock, proceeding to use the trie while the failure states were still being modified by the first thread.

In addition to this race condition, there was also some possible undefined behaviour introduced by the double-checked locking implementation. This has been cleaned up a little with C++11 atomics to make the implementation more portable, if a little slower. See https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ for more information.

genbattle avatar Aug 21 '19 05:08 genbattle

Hello, I can not compile that PR.

I get the following errors:

aho_corasick.hpp:557:39: error: unknown type name 'd_mutex'; did you mean '__gnu_cxx::__mutex'?
aho_corasick.hpp:560:17: error: unknown type name 'my_type'
aho_corasick.hpp:560:27: error: 'this' argument to member function 'construct_failure_states' has type 'const aho_corasick::basic_trie<char>', but function is not marked const

dkurzaj avatar Nov 06 '19 18:11 dkurzaj

Ah sorry, I just pished the patch I created, but there's more changes in our local copy that haven't been upstreamed, I'll add them to this PR.

genbattle avatar Nov 06 '19 19:11 genbattle

Ok so the further changes make basic_tree::parse_text a cont method, and implement locking such that all const methods are thread safe, which is where this bug originally comes from. So this bug is specific to our fork of this library, and doesn't apply to the upstream version because there are no such guarantees around mutability or thread-safety.

I will add these changes into the PR, but it is up to you whether you would like to accept them or keep the behaviour as it currently is. I apologise for the mess, I'm picking up maintenance of this code after someone else has left so I'm still trying to get everything in order.

genbattle avatar Nov 06 '19 19:11 genbattle

I have pushed all of the changes. As I have already said, you're free to choose not to take them if you don't think this functionality is valuable. It certainly seems to impact the benchmark performance of the library, which makes sense since there's some additional locking overhead.

master:

*** Aho-Corasick Benchmark ***
Generating input text ... done
Generating search patterns ... done
Generating trie ... done
Running .......... done
Results:
  loop #1, naive: 394ms, ac: 3ms
  loop #2, naive: 381ms, ac: 3ms
  loop #3, naive: 394ms, ac: 3ms
  loop #4, naive: 377ms, ac: 3ms
  loop #5, naive: 387ms, ac: 3ms
  loop #6, naive: 390ms, ac: 3ms
  loop #7, naive: 376ms, ac: 3ms
  loop #8, naive: 381ms, ac: 3ms
  loop #9, naive: 430ms, ac: 3ms
  loop #10, naive: 415ms, ac: 11536ms

this branch:

*** Aho-Corasick Benchmark ***
Generating input text ... done
Generating search patterns ... done
Generating trie ... done
Running .......... done
Results:
  loop #1, naive: 463ms, ac: 3ms
  loop #2, naive: 638ms, ac: 3ms
  loop #3, naive: 748ms, ac: 4ms
  loop #4, naive: 686ms, ac: 4ms
  loop #5, naive: 641ms, ac: 4ms
  loop #6, naive: 713ms, ac: 3ms
  loop #7, naive: 799ms, ac: 4ms
  loop #8, naive: 813ms, ac: 4ms
  loop #9, naive: 872ms, ac: 4ms
  loop #10, naive: 919ms, ac: 16803ms

genbattle avatar Nov 06 '19 20:11 genbattle

Thank you! It seems to work very well for my use case too.

And it also probably fixes: #13. :smiley:

I'm not owner nor reviewer, but I would say it's a good improvement. If my feedback can help merge this.

dkurzaj avatar Nov 07 '19 10:11 dkurzaj