kanren icon indicating copy to clipboard operation
kanren copied to clipboard

Remove recursive `interleave` calls

Open rlouf opened this issue 3 years ago • 13 comments

The following code fails with a RecursionError:

from kanren import eq, lall, lany, run, vars

observations = [(0, 1), (1, 0)]

def generate(pixels):
    pairs = [(a, b) for a, b in zip(pixels[:-1], pixels[1:])]
    return lall(
        *[lany(*[eq(obs, pair) for obs in observations]) for pair in pairs]
    )

N = 1000
pixels = vars(N)
res = run(1, pixels, generate(pixels))

The error is raised in the interleave function from toolz.

rlouf avatar Aug 10 '22 20:08 rlouf

It appears that toolz does not use trampoline evaluation, and has no intention to do so. We may need to implement a version of interleave that uses it to avoid this RecursionError.

rlouf avatar Aug 10 '22 21:08 rlouf

It appears that toolz does not use trampoline evaluation, and has no intention to do so. We may need to implement a version of interleave that uses it to avoid this RecursionError.

Yes, that's the same type of thing I did in etuples and unification to get past recursion limits/errors.

brandonwillard avatar Aug 11 '22 01:08 brandonwillard

The error is raised in the interleave function from toolz.

I'm not able to reproduce the error locally.

brandonwillard avatar Aug 11 '22 01:08 brandonwillard

Ah. I tried in a fresh environment and the error still shows on my machine.

rlouf avatar Aug 11 '22 14:08 rlouf

FYI: If I set N to something large, the issue appears.

~Anyway, yes, we should make our own version of interleave that doesn't use recursion.~

brandonwillard avatar Aug 16 '22 19:08 brandonwillard

I may be way out my league here, but this could not be done with zip and enumerate? I've included a small snippet to illustrate what I mean.

def interleave(*iters):
  res = []
  for z in zip(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

print(interleave([1,2,3], [4,5,6], [7,8,9]))
# returns: [1, 4, 7, 2, 5, 8, 3, 6, 9]

If this is a complete misunderstanding, just let me know; I do not mean to interrupt or obstruct.

Edit: a little more experimenting below -->

import random

def split(s: str) -> list[str]:
  return [x for x in s]

def listbench(lstlen, x):
  res = []
  for x in range(x):
    aux = [random.randrange(1,11) for _ in range(lstlen)]
    res.append(aux)
  return res

print(interleave(*listbench(10, 5000)))

This appears to work even with larger lists, I tested 50_000 and 500_000, both successful.

wrq avatar Jan 25 '23 20:01 wrq

I may be way out my league here, but this could not be done with zip and enumerate? I've included a small snippet to illustrate what I mean.

def interleave(*iters):
  res = []
  for z in zip(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

print(interleave([1,2,3], [4,5,6], [7,8,9]))
# returns: [1, 4, 7, 2, 5, 8, 3, 6, 9]

If this is a complete misunderstanding, just let me know; I do not mean to interrupt or obstruct.

Now that I'm coming back to this, I don't think the problem is in interleave itself, as much as our use of it in lall -> lconj -> lconj_seq -> bind. The generator produced by lconj_seq (well, the lconj_seq_goal it creates) is effectively composing N - 1 interleave calls. I think that's actually the thing we need to "flatten".

brandonwillard avatar Jan 25 '23 22:01 brandonwillard

I can see that I am in fact out of my league here then, cheers and good luck!

wrq avatar Jan 25 '23 23:01 wrq

I can see that I am in fact out of my league here then, cheers and good luck!

There's a bit of context here that may make things seem foreign, but we're really just talking about generators calling generators in a particular way.

The first step toward solving this issue is to remove all the unnecessary elements and obtain a more direct illustration of the issue—if possible.

Here's a first pass at that:

from kanren import var, eq
from kanren.core import lconj_seq


q_lv = var()

goal_1 = lconj_seq((eq(q_lv, i) for i in range(10000)))

goal_1
# <function kanren.core.lconj_seq.<locals>.lconj_seq_goal(S)>

state_0 = {}
state_stream_1 = goal_1(state_0)

state_stream_1
# <generator object lconj_seq.<locals>.lconj_seq_goal at 0x7f6e14f580d0>

next(state_stream_1)
# RecursionError: maximum recursion depth exceeded while calling a Python object

Now, we can focus exclusively on the code in kanren.core.lconj_seq.lconj_seq_goal.

brandonwillard avatar Jan 26 '23 00:01 brandonwillard

@wrq, here's a MWE demonstrating the underlying issue:

from functools import reduce
from itertools import chain
from toolz import interleave


def bind(z, g):
    # Both fail for the same reason.
    # return chain.from_iterable(map(g, z))
    return interleave(map(g, z))


def goal_constructor(i):

    def g(s):
        yield (i,) + (s,)

    return g


z = reduce(bind, (goal_constructor(i) for i in range(1, 10)), iter((0,)))

list(z)
# [(9, (8, (7, (6, (5, (4, (3, (2, (1, 0)))))))))]


z = reduce(bind, (goal_constructor(i) for i in range(1, 10000)), iter((0,)))

next(z)
# RecursionError: maximum recursion depth exceeded while calling a Python object

brandonwillard avatar Feb 01 '23 01:02 brandonwillard

Ok. First, I'm sorry that I haven't gotten to this sooner. I'm actually a part-time Walmart worker and a stay-at-home father, not really a professional programmer, so this has been on the backburner for me.

I was able to rig this up:

from functools import reduce
from itertools import chain, zip_longest
#from toolz import interleave

def interleave(*iters):
  res = []
  for z in zip_longest(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

def bind(z, g):
    # Both fail for the same reason.
    # return chain.from_iterable(map(g, z))
    return interleave(map(g, z))

def goal_constructor(i):
    def g(s):
        yield (i,) + (s,)
    return g

z = reduce(bind, (goal_constructor(i) for i in range(1, 10)), iter((0,)))

print(z)        # debug
print(list(z)) # checking for data model weirdness

# [(9, (8, (7, (6, (5, (4, (3, (2, (1, 0)))))))))]

z = reduce(bind, (goal_constructor(i) for i in range(1, 10000)), iter((0,)))

print(next(next(iter(z))))

--- OUTPUT ---

[<generator object goal_constructor.<locals>.g at 0x7fa23d01cdd0>]
[<generator object goal_constructor.<locals>.g at 0x7fa23d01cdd0>]
(9999, <generator object goal_constructor.<locals>.g at 0x7fa23c954eb0>)

This uses my little interleave hack from before with run-of-the-mill itertools, and then I noticed that z needs to be an iterator and cannot be a regular list. This is obviously a cumbersome way of thinking about the objects, because of the original Scheme domain would allow it.

I suppose the new interleave could just "return iter(res)" ?

I'm a little embarrassed to say that I'm not even confident that this is a helpful response and that I'm missing something fundamental about all of this.

Edit: I just realized something. zip() will terminate on the shortest iterator, which is probably not the desired behavior? It should be zip_longest() up there. I've made the edit, but if that's what is desired, then it could just be zip().

wrq avatar Feb 05 '23 13:02 wrq

Ok. First, I'm sorry that I haven't gotten to this sooner. I'm actually a part-time Walmart worker and a stay-at-home father, not really a professional programmer, so this has been on the backburner for me.

No problem; I just don't want you to get the impression that this stuff is over your head, because it's not. It's just a rather context-specific problem with some unstated constraints.

From a quick look at your approach, it seems like you have the right idea: i.e. delay evaluation further. This will require changes to the way that z is handled downstream, though.

That approach could be feasible, as long as it doesn't dramatically change the stream processing semantics and has a manageable refactoring footprint. In other words, all the user-facing functions involved (e.g. the goal constructors, bind, ldisj_seq, lconj_seq, etc.) need to provide roughly the same types of objects as they currently do.

Also, the interleave implementation will likely need to be a generator itself, because essentially every iterator in this context can be infinite.

brandonwillard avatar Feb 17 '23 23:02 brandonwillard

I think we might need to "flatten" this example and see what can be done from there. That would involve combining the interleave, bind, and reduce into a single generator function that takes the goal_constructor iterator as an argument.

brandonwillard avatar Feb 17 '23 23:02 brandonwillard