cats-stm icon indicating copy to clipboard operation
cats-stm copied to clipboard

(Lack of) opacity

Open durban opened this issue 3 years ago • 5 comments

Opacity is a correctness property for STM algorithms. Here is a short description and pointers to more reading: https://nbronson.github.io/scala-stm/semantics.html#opacity. The really short version is that opacity guarantees that a running transaction will never read inconsistent values from transactional varables, even if that transaction will never commit or fail.

It seems that currently cats-stm does not provide opacity. We can see this with the following example: let's have two TVar[Int]s, r1 and r2, with the invariant that r1 < r2. One transaction increments the values (transactionally), upholding this invariant:

  def txn1(r1: TVar[Int], r2: TVar[Int]): Txn[Unit] = {
    r1.modify(_ + 1) *> r2.modify(_ + 1)
  }

Another transaction reads from both TVars, and compares the values; if it reads impossible (inconsistent) values, it prints ERROR: ...:

  def txn2(r1: TVar[Int], r2: TVar[Int]): Txn[Int] = {
    r1.get.flatMap { v1 =>
      r2.get.map { v2 =>
        if (v2 <= v1) {
          println(s"ERROR: (${v1}, ${v2})")
        }
        v2 - v1
      }
    }
  }

Sometimes when txn1 and txn2 run concurrently, txn2 can observe the impossible (inconsistent) values (see below for a full reproduction). Such running instances of txn2 seem to not commit/fail (they seem to be retried). However, observing such inconsistency in a running transaction can still be problematic; for example, we could have while (true) {} instead of println(...), which would cause the transaction to never commit or fail.

Full code example:

package com.example

import cats.effect.kernel.Concurrent
import cats.effect.IO
import cats.syntax.all._

import io.github.timwspence.cats.stm.STM

import munit.CatsEffectSuite

class StmExample[F[_]](stm: STM[F])(implicit F: Concurrent[F]) {

  import stm._

  private[this] def txn1(r1: TVar[Int], r2: TVar[Int]): Txn[Unit] = {
    r1.modify(_ + 1) *> r2.modify(_ + 1)
  }

  private[this] def txn2(r1: TVar[Int], r2: TVar[Int]): Txn[Int] = {
    r1.get.flatMap { v1 =>
      r2.get.map { v2 =>
        if (v2 <= v1) {
          println(s"ERROR: (${v1}, ${v2})")
        }
        v2 - v1
      }
    }
  }

  def example: F[Int] = for {
    r1 <- stm.commit(TVar.of(0))
    r2 <- stm.commit(TVar.of(1))
    t1 = stm.commit(txn1(r1, r2))
    t2 = F.parReplicateAN(32)(
      replicas = 64,
      ma = stm.commit(txn2(r1, r2)).flatTap {
        case 1 => F.unit
        case x => F.raiseError[Unit](new Exception(s"result was $x"))
      }
    )
    b = F.both(F.cede *> t1, F.cede *> t2)
    r <- b.replicateA(8192).map(_.last._2)
  } yield r.last
}

final class CatsStmTest extends CatsEffectSuite {

  test("Test STM") {
    STM.Make[IO].runtime(Long.MaxValue).flatMap { stm =>
      assertIO(new StmExample[IO](stm).example, 1)
    }
  }
}

Running this test results in output like this:

ERROR: (1388, 1388)
ERROR: (1388, 1388)
ERROR: (1388, 1388)
ERROR: (2205, 2205)
ERROR: (2205, 2205)

(The number of ERROR lines is non-deterministic, sometimes even zero.)

durban avatar Jan 24 '22 17:01 durban

Sorry @durban just saw this! Thanks for raising it! This definitely should not happen. I'll look into it ASAP

TimWSpence avatar Feb 02 '22 10:02 TimWSpence

@durban sorry, I understand this now. This is currently by design (we allow potentially stale reads but then retry the transaction upon commit). AFAICT Haskell's current implementation does this as well. A cursory skim of the literature suggests that there is probably a reasonably significant performance tradeoff here.

The cats-stm docs should definitely make this clear though and should also spell out more clearly though that side effects (eg println above) are not supported in transactions, as they may be retried a non-deterministic number of times.

If you're happy then I suggest we update the docs and close this issue as working as intended, but we could also open another issue to investigate the feasibility and performance of an alternative implementation based on TL2 or similar?

TimWSpence avatar Feb 02 '22 12:02 TimWSpence

Well ... documenting it and doing nothing is definitely an option. I don't think it's a very good one, because:

  • The consequences of the lack of opacity are very non-intuitive. Basically, we can't trust our invariants, and this can cause very surprising behavior (e.g., exceptions and infinite loops), which could happen only very rarely.
  • Just to be clear, these problems are not really related to side effects, the println in the example is just to make it very clear what happens.
  • As far as I know, Haskell can "get away" with not having opacity, by having extra support for STM in the runtime system (i.e., it is not solely a library-STM).
  • The performance hit is probably true, although I'm not sure how big it would be. (I find it hard to have an intuition for STM performance; I even find it hard to write good benchmarks. For example, scala-stm has opacity, but seems faster than cats-stm, based on my very limited benchmarks. But that could be simply due to the monadic API... I don't know.)
  • (I would find it hard to write about this issue in the docs in a way which is both honest and "not scary"... but that could be just me, as I find this issue "scary". :-)

Having said that, if you do not want to provide opacity, that's definitely your decision.

durban avatar Feb 02 '22 15:02 durban

@durban so sorry I just got round to looking at this again.

I don't think I agree that this can result in invariants being violated (although it certainly can result in a different branch of an orElse being chosen based on a stale read). Can you give an example of what you are concerned about? Again I think that the problems are related to side effects allowing you to observe a transaction in an inconsistent state. But that transaction will be retried rather than committed and hence no invariants are broken.

The performance is interesting. I should definitely have a look at what scala-stm is doing some time. I can definitely believe that it is faster as I have spent approximately zero time optimizing this 😅

Thanks for raising this issue! It's a very interesting one

TimWSpence avatar Mar 15 '22 16:03 TimWSpence

@TimWSpence I'm talking about the fact, that while we have the invariant that r1 < r2 (and when writing, we uphold this invariant), we still can observe a case in a running transaction when r1 >= r2. It's (probably) true, that such a transaction will never commit, but preferably transactions would commit after a finite number of steps. Here is a full example, where a transaction never commits due to reading "impossible" values: https://github.com/zio/zio/issues/6324 (the example is for ZSTM, but I think the same principle applies to cats-stm).

I was also thinking about that the JVM actually saves us from the most terrible things (e.g., it guarantees memory safety). The worst things I could come up are:

  • Infinite loop (see above).
  • Infinite retry (I'm not sure this can actually happen, I couldn't find an example).
  • Exhaustion of other system resources (mostly memory; we'd probably get an OutOfMemoryError, so the JVM helps here too).
  • Throwing exceptions (probably not a very big concern, I expect the transaction would be retried?).

durban avatar Mar 15 '22 21:03 durban