coro icon indicating copy to clipboard operation
coro copied to clipboard

Ignore return values of generators?

Open jcheng5 opened this issue 1 year ago • 5 comments

(REALLY great work on this package @lionel-, it's astonishingly good!)

I would expect this code:

library(coro)

gen <- generator(function() {
  yield(1)
  2
})

collect(gen())

to result in list(1), but it results in list(1, 2). The current behavior appears to be intentional, if test-generator.R is any indication.

It's causing problems for me in at least two usage patterns.

Early exit

library(coro)

g <- generator(function(items) {
  if (length(items) < 3) {
    # Do nothing
    return()
  }
  
  for (i in items) {
    yield(i)
  }
})

collect(g(c(1,2)))

I want list(), but instead I get list(NULL). Currently I'm working around this by never returning early.

Side-effecty code after generation

library(coro)

yielded_values <- c()

g <- generator(function(items) {
  for (i in items) {
    yield(i)
  }
  yielded_values <<- c(yielded_values, items)
})

collect(g(c(1,2)))

I want list(1, 2), but instead I get list(1, 2, yielded_values). Currently I'm working around this by including if (FALSE) yield(NULL) as the last line of the generator function--that seems to prod the state machine into having the behavior I want.

jcheng5 avatar Sep 04 '24 00:09 jcheng5

hmm I don't remember exactly but I may have been confused by the meaning of {value: "something", done: true} returned from JS generators. I may have thought this was an inclusive boundary for the generator iterator, but testing in a for loop I now see it's an exclusive boundary, despite the value being returned to the caller.

Your examples are convincing and coro should be consistent with existing design in other languages to avoid surprises. So I think we should fix this.

@dfalbel do you think changing this is likely to cause issues for mlverse packages?

(Note that the early exit case seems like a bug in coro because returned NULL values should be transformed to the exhausted sentinel IIRC.)

lionel- avatar Sep 09 '24 07:09 lionel-

I don't think this will cause issues with the mlverse usage of coro. We're not using generator, but only iterator functions built directly.

For the early returns, is this expected to only affect generators? I'm assuming we can still collect NULLs from eg:

iter = function() {
  i <<- 0
  function() {
    i <<- i + 1
    if (i == 1) {
      return(1)
    } else if (i == 2) {
      return(NULL)
    } 

    coro::exhausted()
  }
}


coro::collect(iter())

#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> NULL

dfalbel avatar Sep 10 '24 11:09 dfalbel

@dfalbel Correct, this would only affect generators.

lionel- avatar Sep 10 '24 11:09 lionel-

There's the question of what to do with returned values. Do we simply throw away the information and return the exhaustion sentinel instead? Ideally we'd return the value along with the exhaustion signal.

To do the latter I'll have to change the way exhaustion is currently signalled with a sentinel value (a special symbol created with exhausted()). We'd need to switch to the JS approach of a structured value with value and done fields. Packages and users are supposed to treat the exhaustion sentinel as an opaque value, so that shouldn't cause (too much) incompatibility.

We can start with throwing away the return value though.

lionel- avatar Oct 15 '24 09:10 lionel-

Error handling is a tricky case. If we ignore return values this returns exhausted():

my_gen <- generator(function() {
  tryCatch(
    1 + "",
    error = function(cnd) "handled"
  )
})

So it looks like we need to allow yield() inside error handlers. This is also how it works in JS:

function* errorGenerator() {
  try {
    yield 'ok';
    throw new Error('foo');
  } catch (err) {
    yield err.message;
  }
}

This is quite a breaking change.

lionel- avatar Oct 15 '24 10:10 lionel-