the_silver_searcher
the_silver_searcher copied to clipboard
ag cannot search files > 2Gb
If I try to search a file bigger than 2Gb, I get the follwing error: ERR: Skipping system.log: pcre_exec() can't handle files larger than 2147483647 bytes
Grep and ack both work fine (although for ack it takes like forever).
This is by design. The regex Engine (PCRE) can't handle files that large.
You can find here that the maximum subject (file) length is INT_MAX
which is 2147483647 for a signend 32bit int. Therefore the maximum file size is INT_MAX
in bytes.
Most uses of grep/ack/ag are line by line searches. You would only have to reach the maximum subject length on multiline searches or if a single line is over 2G. So for most other uses ag would only need to match a single line at a time.
You are right in saying most searches are single line only. Nevertheless ag does multi-line searching by default (as far as I know it matches newlines with the \s
regex).
I found that files greater than 2GB can be searched with a literal (not regex) pattern. In theory ag could make a case-by-case decision and only raise that error in case of a single line greater INT_MAX
bytes or multiline searching.
Maybe @ggreer could mention whether he wants this or not. I'm not sure how much work it would be to patch this to support the above case-by-case choice.
pcre has a new version called pcre2 with backwards incompatible new API. The new API uses size_t
instead of int
to refer to lengths, so it can handle strings larger than 2GB. Maybe ag
should update to require pcre2 instead.
@jschpp how do you do literal patterns?
@njt1982 From ag --help
: -Q --literal Don't parse PATTERN as a regular expression
I'm hit by this limitation today to grep my 7GB mailbox. It would be really great to handle large files.
I hit these errors today when running ag from my home directory and then ag dumped core. Too bad I didn't have ulimit -c unlimited enabled
Is there maybe a change for a command line argument that would mean something like "process first 2GB of data, then give up"?
You could add a flag to split the file in 2GB parts and then merge the result of every run.