scio
scio copied to clipboard
Increased memory consumption after upgrade from 13.6 to 14.8
I upgraded one of our projects from version 13.6 to 14.8 and respectively from Beam 2.52 to 2.59. The upgrade process itself went smooth, but for a Dataflow streaming job we started to experienced gc thrashing. Increasing the memory slightly is not resolving the issue completely. Therefore, I compared the heap dumps and what I could observe is that the amount of coders and respectively their size increased a lot for the same job.
Scio 14.8
Scio 13.6
Scio 14.8 vs 13.6
I used Scala 2.13 but I couldn't note a difference when using 2.12 as I initially suspected the Scala Version upgrade and potential changes in the behavior of the macros.
In general, the project has a pretty huge case class hierarchy, where the majority of Coders for these classes used in the job are generated within a package object
through Coder.gen
. I don't now if the Coder sizes/amounts are the root cause for the gc thrashing issue, but they are at least the most notable difference in the heap dump.
Does anyone of you have an idea if this is related to a change in Scio or Apache Beam and if there is a potential solution? It might be related to these changes done in 14.x: https://github.com/spotify/scio/pull/5199 Thanks for taking a look. I'm happy to provide further information in case needed.
Some more details:
Scio 14.8:
Scio 13.6: