Puget requires a large amount of memory when printing large collections
Minimal reproduction: (puget.printer/pprint (repeatedly 500000 #(rand-int 1000000000)))
When I run this with -Xmx200m and watch GC logs, I can see the memory usage gradually growing, and the program slows down as it approaches the maximum usage; with -Xmx500m it has enough headroom to finish in a more reasonable amount of time.
I'm fairly confident this is due to how this line interacts with the four-element vector produced here. In particular, because the vector is chunked, the map inside the mapcat will retain a reference to the third element (the large collection).
As evidence for this explanation, this change to fipp (unchunking the sequence) seems to fix it:
diff --git a/src/fipp/engine.cljc b/src/fipp/engine.cljc
index 8e6266d..9904a0e 100644
--- a/src/fipp/engine.cljc
+++ b/src/fipp/engine.cljc
@@ -12,7 +12,7 @@
(defn serialize [doc]
(cond
(nil? doc) nil
- (seq? doc) (mapcat serialize doc)
+ (seq? doc) (mapcat serialize (take 1e100 doc))
(string? doc) [{:op :text, :text doc}]
(keyword? doc) (serialize-node [doc])
I do not know what a clean fix for this would be. I'm not sure we can make the above change to fipp without potentially sacrificing performance in the farther-down-the-stack case where it's processing an actual large sequence, rather than a vector containing a large sequence as an element. And I can't think of anything that puget could do to cause fipp to behave differently.
It's from a while ago so I can't remember all of the details, but this fipp PR and the comments therein are related. I found that unchunking that exact same sequence prevented heap exhaustion on JDK8+.