BUG: Too much AST nesting causes pathological blowup
I've tried minimising this (reasonably) as much as I can (each step takes ages to test). With SBT set to use 1GB heap, a wiki consisting of the following page (and a small home page, but I'm sure that won't make the difference):
# Bla
@:callout(info)
* Bla
@:callout(warning)
* There is a bug that involves nesting and exponential growth:
* ```scala
val foo =
bar <**> Bar(bar)
```
```scala
val foo = Bar(bar, "bla" ~> bla <~ "bla")
```
`bar` **baz**.
* ```scala
val foo =
Bar(bar, "blaaa" ~> bla <~ "bla").label("bar")
```
Bla bla, bla bla bla bla bla, bla bla bla `bla` *bla* bla bla bla.
Bla bla bla bla bla.
Bla, bla `Bar` bla bla bla: `bar`;
bla blabla blabla bla bla bla bla _bla_ bla, bla bla bla bla **bla bla**
bla bla `bridge` bla bla blabla blablabla.
@:@
@:@
Will crash with an out of memory error, consuming all 1GB of ram. The stack trace varies as the document size is altered, but the one constant is reference to BlockSource.reverse et al:
java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringBuilder.toString(StringBuilder.java:475)
at scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:433)
at scala.collection.mutable.StringBuilder.result(StringBuilder.scala:449)
at scala.collection.mutable.StringBuilder.result(StringBuilder.scala:34)
at scala.collection.IndexedSeqOptimized.reverse(IndexedSeqOptimized.scala:226)
at scala.collection.IndexedSeqOptimized.reverse$(IndexedSeqOptimized.scala:218)
at scala.collection.immutable.StringOps.reverse(StringOps.scala:33)
at laika.parse.LineSource.reverse(SourceCursor.scala:229)
at laika.parse.BlockSource.$anonfun$reverse$1(SourceCursor.scala:341)
at laika.parse.BlockSource$$Lambda/0x0000776d75bd0000.apply(Unknown Source)
at scala.collection.immutable.List.map(List.scala:297)
at cats.data.Chain.map(Chain.scala:220)
at cats.data.NonEmptyChainOps$.map$extension(NonEmptyChain.scala:387)
at laika.parse.BlockSource.reverse(SourceCursor.scala:341)
at laika.parse.BlockSource.reverse(SourceCursor.scala:292)
at laika.parse.LineSource.reverse(SourceCursor.scala:229)
at laika.parse.BlockSource.$anonfun$reverse$1(SourceCursor.scala:341)
at laika.parse.BlockSource$$Lambda/0x0000776d75bd0000.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda/0x0000776d741c3400.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at cats.data.Chain.map(Chain.scala:220)
at cats.data.NonEmptyChainOps$.map$extension(NonEmptyChain.scala:387)
at laika.parse.BlockSource.reverse(SourceCursor.scala:341)
[error] [launcher] error during sbt launcher: java.lang.OutOfMemoryError: Java heap space
This seems like a pathological exponential blow up going on.
I do not know that this is the issue, but I wanted to link this prior issue that I had with weird markup and surprisingly complexity issues in parsing:
https://github.com/typelevel/Laika/issues/341
Totally naively, I wonder if you ditch the asterisks if your perf issue would decrease?
It's possible. The asterisk parsing no doubt triggers special logic to rule out emph/bold and likely requires reverse. I do have a plan to at least improve the performance of the parser somewhat by reducing redundant string copying when the reversing happens (in effect the whole file is copied several times during the stack traces that lead to memory exhaustion). However, if - also works for bullets, that might fix my issue in the "immediate"-term. Long term, it would be good to investigate more deeply
Alas, bullets not asterisks, it seems: * and - are no different.