miller icon indicating copy to clipboard operation
miller copied to clipboard

Too many open files while splitting

Open holmescharles opened this issue 3 years ago • 3 comments

I'm trying to filter my data by group. The first step of this involves splitting the data with the verb split and using the "-g" option. However, I get the following error":

mlr: open split_{group id stuff}.pprint: too many open files

In bash, you can replicate this error with:

(echo a; seq 10000) | mlr --pprint split -g a

I get the following error, though I imagine the exact "split" the error occurs on may be machine-dependent:

mlr: open split_1019.pprint: too many open files

holmescharles avatar Oct 17 '22 20:10 holmescharles

Hi @holmescharles !!

One option is to use ulimit to increase the number of open files: https://github.com/johnkerl/miller/issues/299

What Miller really needs is a process-internal LRU cache of some sort so it wouldn't need to keep an open descriptor for every single file, but this is a development to-do ...

johnkerl avatar Oct 18 '22 03:10 johnkerl

FYI also note that the same limit (set with ulimit) is not unique to miller and applies to other tools like split or awk.

Incidentally, I've hit that limit before with awk and the solution was to close unneeded files, freeing the file handles. Could miller do that?

janxkoci avatar Nov 02 '22 15:11 janxkoci

@janxkoci yes indeed, we are talking about the same thing! :)

What Miller really needs is a process-internal LRU cache of some sort so it wouldn't need to keep an open descriptor for every single file, but this is a development to-do ...

johnkerl avatar Nov 02 '22 16:11 johnkerl