fs icon indicating copy to clipboard operation
fs copied to clipboard

Use tree-seq to make iterate-dir lazy

Open jvoegele opened this issue 8 years ago • 1 comments

The previous implementation of iterate-dir was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before the iterate-dir function returned. When used on very large directory trees, this could be very slow and could also produce a java.lang.OutOfMemoryError. (See issue #38)

By using tree-seq to lazily traverse the directory tree, the iterate-dir function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means that iterate-dir will not itself produce an OutOfMemoryError when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible to OutOfMemoryErrors. Even using the lazy tree-seq approach, an OutOfMemoryError can be produced when processing very large directory trees (e.g. with dorun or doseq, or even count):

OutOfMemoryError GC overhead limit exceeded

Nevertheless, this is still an improvement over the previous implementation since the OutOfMemoryError is not produced immediately upon calling iterate-dir, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.

jvoegele avatar Dec 24 '16 16:12 jvoegele

@Raynes Any thoughts on this?

jvoegele avatar Feb 15 '17 13:02 jvoegele