fs
fs copied to clipboard
Use tree-seq to make iterate-dir lazy
The previous implementation of iterate-dir
was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before the iterate-dir
function returned. When used on very large directory trees, this could be very slow and could also produce a java.lang.OutOfMemoryError
. (See issue #38)
By using tree-seq
to lazily traverse the directory tree, the iterate-dir
function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means that iterate-dir
will not itself produce an OutOfMemoryError
when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible to OutOfMemoryError
s. Even using the lazy tree-seq
approach, an OutOfMemoryError
can be produced when processing very large directory trees (e.g. with dorun
or doseq
, or even count
):
OutOfMemoryError GC overhead limit exceeded
Nevertheless, this is still an improvement over the previous implementation since the OutOfMemoryError
is not produced immediately upon calling iterate-dir
, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.
@Raynes Any thoughts on this?