Laika icon indicating copy to clipboard operation
Laika copied to clipboard

BUG: Too much AST nesting causes pathological blowup

Open j-mie6 opened this issue 4 months ago • 3 comments

I've tried minimising this (reasonably) as much as I can (each step takes ages to test). With SBT set to use 1GB heap, a wiki consisting of the following page (and a small home page, but I'm sure that won't make the difference):

# Bla

@:callout(info)

* Bla

  @:callout(warning)

    * There is a bug that involves nesting and exponential growth:
    
        * ```scala
          val foo = 
              bar <**> Bar(bar)
          ```

          ```scala
          val foo = Bar(bar, "bla" ~> bla <~ "bla")
          ```

          `bar` **baz**.

        * ```scala
          val foo = 
              Bar(bar, "blaaa" ~> bla <~ "bla").label("bar")
          ```

          Bla bla, bla bla bla bla bla, bla bla bla `bla` *bla* bla bla bla.
          Bla bla bla bla bla.

      Bla, bla `Bar` bla bla bla: `bar`;
      bla blabla blabla bla bla bla bla _bla_ bla, bla bla bla bla **bla bla**
      bla bla `bridge` bla bla blabla blablabla.
  @:@

@:@

Will crash with an out of memory error, consuming all 1GB of ram. The stack trace varies as the document size is altered, but the one constant is reference to BlockSource.reverse et al:

java.lang.OutOfMemoryError: Java heap space
        at java.base/java.lang.StringBuilder.toString(StringBuilder.java:475)
        at scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:433)
        at scala.collection.mutable.StringBuilder.result(StringBuilder.scala:449)
        at scala.collection.mutable.StringBuilder.result(StringBuilder.scala:34)
        at scala.collection.IndexedSeqOptimized.reverse(IndexedSeqOptimized.scala:226)
        at scala.collection.IndexedSeqOptimized.reverse$(IndexedSeqOptimized.scala:218)
        at scala.collection.immutable.StringOps.reverse(StringOps.scala:33)
        at laika.parse.LineSource.reverse(SourceCursor.scala:229)
        at laika.parse.BlockSource.$anonfun$reverse$1(SourceCursor.scala:341)
        at laika.parse.BlockSource$$Lambda/0x0000776d75bd0000.apply(Unknown Source)
        at scala.collection.immutable.List.map(List.scala:297)
        at cats.data.Chain.map(Chain.scala:220)
        at cats.data.NonEmptyChainOps$.map$extension(NonEmptyChain.scala:387)
        at laika.parse.BlockSource.reverse(SourceCursor.scala:341)
        at laika.parse.BlockSource.reverse(SourceCursor.scala:292)
        at laika.parse.LineSource.reverse(SourceCursor.scala:229)
        at laika.parse.BlockSource.$anonfun$reverse$1(SourceCursor.scala:341)
        at laika.parse.BlockSource$$Lambda/0x0000776d75bd0000.apply(Unknown Source)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.TraversableLike$$Lambda/0x0000776d741c3400.apply(Unknown Source)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at cats.data.Chain.map(Chain.scala:220)
        at cats.data.NonEmptyChainOps$.map$extension(NonEmptyChain.scala:387)
        at laika.parse.BlockSource.reverse(SourceCursor.scala:341)
[error] [launcher] error during sbt launcher: java.lang.OutOfMemoryError: Java heap space

This seems like a pathological exponential blow up going on.

j-mie6 avatar Aug 18 '25 14:08 j-mie6

I do not know that this is the issue, but I wanted to link this prior issue that I had with weird markup and surprisingly complexity issues in parsing:

https://github.com/typelevel/Laika/issues/341

Totally naively, I wonder if you ditch the asterisks if your perf issue would decrease?

valencik avatar Aug 23 '25 19:08 valencik

It's possible. The asterisk parsing no doubt triggers special logic to rule out emph/bold and likely requires reverse. I do have a plan to at least improve the performance of the parser somewhat by reducing redundant string copying when the reversing happens (in effect the whole file is copied several times during the stack traces that lead to memory exhaustion). However, if - also works for bullets, that might fix my issue in the "immediate"-term. Long term, it would be good to investigate more deeply

j-mie6 avatar Aug 23 '25 19:08 j-mie6

Alas, bullets not asterisks, it seems: * and - are no different.

j-mie6 avatar Aug 24 '25 15:08 j-mie6