atomspace icon indicating copy to clipboard operation
atomspace copied to clipboard

Parallelize the pattern matcher

Open linas opened this issue 6 years ago • 9 comments

See opencog/cogutil#108 for the discussion.

linas avatar Apr 25 '18 17:04 linas

See email for the discussion.

I will continue discussion here as @linas suggested.

@linas, thank you for spotting NextLink issue https://github.com/opencog/atomspace/issues/1507. Copying PatternMatchCallback was my first attempt to implement parallel matching. But many code parts expect that PatternMatchCallback.grounding() method collects the result and return it synchronously. And there are dozen different grounding() implementations which do it slightly differently.

NextLink idea can resolve this issue. Next steps as I see them:

  • add NextLink to collect PatternMatch results asynchronously
  • enable parallel matching using copy of PME and PMCB only for queries via NextLink (use single thread version for old clients)
  • migrate old clients client by client to asynchronous results collection

Does it makes sense?

vsbogd avatar Apr 27 '18 11:04 vsbogd

OK. Working on NextLink is kind-of diving into the deep end of the pool; its not clear that you've learned how to swim, yet. For example, NextLink was not originally envisioned as a way to launch some heavy-weight, multi-threaded search; quite the opposite, I was thinking of "lazy evaluation", where it gives you just one answer at a time, only if and when you ask for it.

A better way of doing parallelization is called "scatter-gather", or "map-reduce", and NextLink is a poor API for map-reduce, and I have not spent any time thinking of a good atomese API for map-reduce/scatter-gather type parallel searching. Thus, the next step should really be "is there a good map-reduce-type API for pattern matching"?

linas avatar Apr 27 '18 18:04 linas

So, perhaps the following might be an API:

    QuestionLink
         AnchorNode "FooLocation"
         NumberNode 42
         PatternLink
              ...

so that calling the c++ method QuestionLink::execute() will cause PatternLink::execute() to run (in parallel) until 42 answers are found, which are anchored to the AnchorNode. To obtain the answers, the user then calls

    GatherLink
        AnchorNode "FooLocation"

so that calling the c++ method GatherLink::execute() to return exactly one answer. The above QuestionLink/GatherLink pair form a scatter/gather or map/reduce type API for search.

For lazy evaluation, so that the QuestionLink does not launch any threads, and does no work at all, until GatherLink is called -- for that, perhaps NumberNode 0 could be used to indicate lazy evaluation.

The above proposal is .. I dunno .. its OK, but its not perfect; it seems to have some obvious issues and problems. If I spent half-the-day thinking about it, I might find better answers. Care to find better answers? Keep in mind that "better answers" may require implementing #1502 first. This would, in turn, unify the code base and functions for GetLink and MapLink into one. and have large, pervasive changes to the atomspace.

The problem there is that there are a bunch of dominoes all lined up, ready to fall, and parallelizing searches is a domino in the middle of the dominoe chain.

It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...

linas avatar Apr 27 '18 18:04 linas

@linas said

It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...

That's a wise suggestion.

@vsbogd, I guess you could take some time to go through the github issues of the atomspace repo to see if you find something with the right level challenge for you (probably avoiding the ones labeled "quick task" as they are already too easy for you and best kept for newcomers). As well as obviously keeping improving the benchmark (like addressing https://github.com/opencog/benchmark/issues/8) which is extremely worthwhile.

ngeiswei avatar Apr 30 '18 06:04 ngeiswei

See also #1750 as a better way of publishing changes coming out from various subsystems.

linas avatar Jun 19 '18 21:06 linas

This should be much easier, now, after pull reqs #2453 and #2455 -- See in particular: https://github.com/opencog/atomspace/blob/812580d19773cdf4de533998eca5b5a5a9f9cc7a/opencog/query/InitiateSearchCB.cc#L402-L417

linas avatar Dec 26 '19 15:12 linas

Pull reqs #2615 and #2616 fully parallelize the pattern engine. All unit tests pass. However ... the overhead of creating threads using the OMP for-each is huge: RandomUTest runs 25x slower and GetStateUTest runs 33x slower, so this code is disabled.

The correct fix might be to create a pool of hot "standby" threads, and pull from that pool whenever a thread is needed. Whether this can get performance back to an acceptable level is unknown.

I think the moral of the story is that most pattern-matches happen very quickly, and the overhead of parallelizing drowns the mechanism. A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.

linas avatar May 15 '20 18:05 linas

A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.

Maybe one can come up with a quick estimation formula based on atomspace size, the number of pattern clauses and so. Interestingly with some rudimentary form of pattern indexing, such estimation could be even better.

ngeiswei avatar May 19 '20 05:05 ngeiswei

Two proposals mentioned in other issues, but not mentioned here:

  • Introduce a ParallelAndLink that would run queries in parallel.
  • Set a Value on the pattern that says "run me in parallel mode".

An example of using the first solution would be something like (BindLink (ParallelAndLink stuff-to-match) rewrite-stuff) This way, we don't have to invent half-a-dozen different parallel variants. Along the way, maybe we could come up with a different name for AndLink when used in this way?

Anyway, all the basic support for this has already been implemented several years ago. The code has two different ways of doing this, depending on the compiler support for parallelization. So all that's missing is the "easy stuff" of creating the ParallelAndLink and wiring it into place. Note that QueueValue is 100% thread-safe.

linas avatar Dec 04 '21 22:12 linas