finagle icon indicating copy to clipboard operation
finagle copied to clipboard

finagle-redis: QUEUED responses fail with ClassCastException

Open chrisbenincasa opened this issue 7 years ago • 6 comments

When Redis returns a QUEUED response, the returned value is opaquely cast which causes runtime errors.

Expected behavior

I'm not sure what the right call here is. On one hand, the call to Redis was successful, so failing the Future doesn't seem right. On the other, returning any instance of T also seems wrong and changing everything to Option[T] would be an even larger change. One option is to wrap all results in some "Response" class to provide additional information about the result.

Actual behavior

On QUEUED from Redis, the result is cast to a Future[Nothing] (presumably to get the types working) from Future.done, which ends up being a ConstFuture(Return(Unit)). Callers see the Future as successful, however do not know the underlying value is a BoxedUnit, which leads to ClassCastExceptions when the value is passed to functions.

Example error:

java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
        at com.curalate.redis.client.RedisClientBase$$anonfun$mGet$1.apply(RedisClientBase.scala:73)

Steps to reproduce the behavior

  1. Issue a Redis transaction but do not block on the resultant Future nor attach callbacks
  2. Immediately make another Redis call and attach a callback that operates on the value
  3. More often than not, a ClassCastException or NoSuchMethodError will be thrown. One may have to retry multiple times, as this is a race condition (i.e. one request will reach Redis before the other)

Pseudo-code

val keys = Seq(/* keys */)
val cmds = Seq(/* cmds */)
redisClient.transaction(cmds)
redisClient.mGet(keys).map(keys.zip) // .zip takes an Iterable, so a BoxedUnit will throw

chrisbenincasa avatar Apr 26 '17 22:04 chrisbenincasa

This definitely is a bug, but I think it's a bit more nuanced than handling the QUEUED response. Are you willing to do a bit of debugging? It would be really helpful to see what's going over the wire when this race condition hits.

Questions:

  • Does the ClassCastException stack trace eventually point to that casting line in Finagle?
  • Can you also post the NoSuchMethodError stack trace?

nepthar avatar May 03 '17 18:05 nepthar

@nepthar definitely willing to debug. I was not able to reproduce the NoSuchMethodError (I think that was a bug on my end), but I can reproduce the AbstractMethodError consistently. I setup a quick test in the form I described above. Here are some details:

The test:

it should "fail" in {
  val underlyingClient = client.redisClient
  val keyVals = List.fill(10)(Buf.Utf8(randomString) -> Buf.Utf8(randomString))
  val cmds = Seq(
    MSet(keyVals.toMap)
  ) ++ keyVals.map { case (k, _) => PExpire(k, 100)  }
  underlyingClient.transaction(cmds)
  val fut = underlyingClient.mGet(keyVals.map { case (k, _) => k  }).map(res => {
    res.foreach(println)
  })
  com.twitter.util.Await.result(fut, com.twitter.util.Duration.Top)
}

The breakpoint on QUEUED getting hit: screen shot 2017-05-04 at 3 12 11 pm

The redis monitor output:

) redis-cli -p 30144 monitor
OK
1493925332.472761 [0 172.17.0.1:41410] "MULTI"
1493925332.495212 [0 172.17.0.1:41410] "MSET" "g8iEo" "0qHv1" "9WpFi" "ybjwz" "4ydI3" "8uRXO" "oO29P" "rJWKH" "Chget" "CzTXa" "19J6T" "JhJAl" "jln65" "wW5pg" "o3w9L" "kumUR" "OrOfM" "36eE2" "PLz74" "EyYIj"
1493925332.495270 [0 172.17.0.1:41410] "PEXPIRE" "o3w9L" "100"
1493925332.495296 [0 172.17.0.1:41410] "PEXPIRE" "jln65" "100"
1493925332.495318 [0 172.17.0.1:41410] "PEXPIRE" "g8iEo" "100"
1493925332.495340 [0 172.17.0.1:41410] "PEXPIRE" "4ydI3" "100"
1493925332.495362 [0 172.17.0.1:41410] "PEXPIRE" "Chget" "100"
1493925332.495384 [0 172.17.0.1:41410] "PEXPIRE" "PLz74" "100"
1493925332.495405 [0 172.17.0.1:41410] "PEXPIRE" "OrOfM" "100"
1493925332.495428 [0 172.17.0.1:41410] "PEXPIRE" "oO29P" "100"
1493925332.495450 [0 172.17.0.1:41410] "PEXPIRE" "19J6T" "100"
1493925332.495471 [0 172.17.0.1:41410] "PEXPIRE" "9WpFi" "100"
1493925332.495493 [0 172.17.0.1:41410] "MGET" "o3w9L" "jln65" "g8iEo" "4ydI3" "Chget" "PLz74" "OrOfM" "oO29P" "19J6T" "9WpFi"
1493925332.495557 [0 172.17.0.1:41410] "EXEC"

Test output:

[info] - should fail *** FAILED ***
[info]   java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
[info]   at com.curalate.redis.test.RedisTests$$anonfun$1$$anonfun$17.apply(RedisTests.scala:151)
[info]   at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$3.apply(Future.scala:1145)
[info]   at com.twitter.util.Try$.apply(Try.scala:15)
[info]   at com.twitter.util.Future$.apply(Future.scala:163)
[info]   at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1145)
[info]   at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1144)
[info]   at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:107)
[info]   at com.twitter.util.Promise$Transformer.k(Promise.scala:107)
[info]   at com.twitter.util.Promise$Transformer.apply(Promise.scala:117)
[info]   at com.twitter.util.Promise$Transformer.apply(Promise.scala:98)

The Redis monitoring output tells the tale, I think. You can see that the MGET executes within the MULTI block, thus returning the QUEUED response. My callback on the MGET function in the code then interprets that response as successful, but the underlying type is not a Seq.

Let me know what else I can do to help!

chrisbenincasa avatar May 04 '17 19:05 chrisbenincasa

Nice detective work!

It does indeed look pretty clear cut. I'm not super familiar with redis, but is it also possible to view the server's response to each of those?

A fix for this is now on our roadmap internally, although I'm uncertain of when it will be finished. The immediate workaround is to allow the transaction to complete before issuing another command:

client.transaction(cmds).flatMap { _ => client.mGet(....) }.map { results => ... } 

If you'd like/have time, we're always accepting pull requests! My starting point would be examining RedisPool and seeing what's going on with transactions there.

nepthar avatar May 04 '17 21:05 nepthar

@chrisbenincasa all of our finagle clients are set up to handle one response per one request, and it seems that for these transaction requests, that one-one mapping no longer holds, hence the mismatched requests/reponse pairs, and the ClassCastException. There currently aren't any plans to fix this behavior, but please feel free to submit a PR!

jcrossley avatar May 16 '17 03:05 jcrossley

@jcrossley I'm not sure I follow -- the single QUEUED response comes from a single MGET request. It's not that the response are mismatched (unless you mean that the eventual result for an MGET issues during a transaction ends up as a part of the response for an EXEC request), but rather that the single QUEUED response is cast opaquely to Unit. In other words, consumers have no way of knowing if the single response is actually the type they expect, if the "raw" response from Redis is a QUEUED type.

chrisbenincasa avatar May 19 '17 21:05 chrisbenincasa

As a quick follow-up, I'm certainly interested in taking on the change here. Though, I'm curious as to what your opinions are on the correct solution. To some degree, one could argue this is user error for using Redis transactions in a multi-threaded application environment; so you could consider this a sort of error state (maybe a specialized Exception to let the caller handle the QUEUED case?). On the other hand, the response isn't technically an error, so an exception may not fit the bill.

chrisbenincasa avatar May 19 '17 21:05 chrisbenincasa