atomspace
atomspace copied to clipboard
Parallelize the pattern matcher
See opencog/cogutil#108 for the discussion.
See email for the discussion.
I will continue discussion here as @linas suggested.
@linas, thank you for spotting NextLink issue https://github.com/opencog/atomspace/issues/1507. Copying PatternMatchCallback was my first attempt to implement parallel matching. But many code parts expect that PatternMatchCallback.grounding() method collects the result and return it synchronously. And there are dozen different grounding() implementations which do it slightly differently.
NextLink idea can resolve this issue. Next steps as I see them:
- add NextLink to collect PatternMatch results asynchronously
- enable parallel matching using copy of PME and PMCB only for queries via NextLink (use single thread version for old clients)
- migrate old clients client by client to asynchronous results collection
Does it makes sense?
OK. Working on NextLink is kind-of diving into the deep end of the pool; its not clear that you've learned how to swim, yet. For example, NextLink was not originally envisioned as a way to launch some heavy-weight, multi-threaded search; quite the opposite, I was thinking of "lazy evaluation", where it gives you just one answer at a time, only if and when you ask for it.
A better way of doing parallelization is called "scatter-gather", or "map-reduce", and NextLink is a poor API for map-reduce, and I have not spent any time thinking of a good atomese API for map-reduce/scatter-gather type parallel searching. Thus, the next step should really be "is there a good map-reduce-type API for pattern matching"?
So, perhaps the following might be an API:
QuestionLink
AnchorNode "FooLocation"
NumberNode 42
PatternLink
...
so that calling the c++ method QuestionLink::execute()
will cause PatternLink::execute()
to run (in parallel) until 42 answers are found, which are anchored to the AnchorNode
. To obtain the answers, the user then calls
GatherLink
AnchorNode "FooLocation"
so that calling the c++ method GatherLink::execute()
to return exactly one answer. The above QuestionLink/GatherLink pair form a scatter/gather or map/reduce type API for search.
For lazy evaluation, so that the QuestionLink does not launch any threads, and does no work at all, until GatherLink is called -- for that, perhaps NumberNode 0
could be used to indicate lazy evaluation.
The above proposal is .. I dunno .. its OK, but its not perfect; it seems to have some obvious issues and problems. If I spent half-the-day thinking about it, I might find better answers. Care to find better answers? Keep in mind that "better answers" may require implementing #1502 first. This would, in turn, unify the code base and functions for GetLink and MapLink into one. and have large, pervasive changes to the atomspace.
The problem there is that there are a bunch of dominoes all lined up, ready to fall, and parallelizing searches is a domino in the middle of the dominoe chain.
It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...
@linas said
It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...
That's a wise suggestion.
@vsbogd, I guess you could take some time to go through the github issues of the atomspace repo to see if you find something with the right level challenge for you (probably avoiding the ones labeled "quick task" as they are already too easy for you and best kept for newcomers). As well as obviously keeping improving the benchmark (like addressing https://github.com/opencog/benchmark/issues/8) which is extremely worthwhile.
See also #1750 as a better way of publishing changes coming out from various subsystems.
This should be much easier, now, after pull reqs #2453 and #2455 -- See in particular: https://github.com/opencog/atomspace/blob/812580d19773cdf4de533998eca5b5a5a9f9cc7a/opencog/query/InitiateSearchCB.cc#L402-L417
Pull reqs #2615 and #2616 fully parallelize the pattern engine. All unit tests pass. However ... the overhead of creating threads using the OMP for-each is huge: RandomUTest
runs 25x slower and GetStateUTest
runs 33x slower, so this code is disabled.
The correct fix might be to create a pool of hot "standby" threads, and pull from that pool whenever a thread is needed. Whether this can get performance back to an acceptable level is unknown.
I think the moral of the story is that most pattern-matches happen very quickly, and the overhead of parallelizing drowns the mechanism. A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.
A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.
Maybe one can come up with a quick estimation formula based on atomspace size, the number of pattern clauses and so. Interestingly with some rudimentary form of pattern indexing, such estimation could be even better.
Two proposals mentioned in other issues, but not mentioned here:
- Introduce a
ParallelAndLink
that would run queries in parallel. - Set a Value on the pattern that says "run me in parallel mode".
An example of using the first solution would be something like (BindLink (ParallelAndLink stuff-to-match) rewrite-stuff)
This way, we don't have to invent half-a-dozen different parallel variants. Along the way, maybe we could come up with a different name for AndLink
when used in this way?
Anyway, all the basic support for this has already been implemented several years ago. The code has two different ways of doing this, depending on the compiler support for parallelization. So all that's missing is the "easy stuff" of creating the ParallelAndLink
and wiring it into place. Note that QueueValue is 100% thread-safe.