the_silver_searcher icon indicating copy to clipboard operation
the_silver_searcher copied to clipboard

ag cannot search files > 2Gb

Open ggl opened this issue 8 years ago • 10 comments

If I try to search a file bigger than 2Gb, I get the follwing error: ERR: Skipping system.log: pcre_exec() can't handle files larger than 2147483647 bytes

Grep and ack both work fine (although for ack it takes like forever).

ggl avatar Sep 28 '16 10:09 ggl

This is by design. The regex Engine (PCRE) can't handle files that large. You can find here that the maximum subject (file) length is INT_MAX which is 2147483647 for a signend 32bit int. Therefore the maximum file size is INT_MAX in bytes.

jschpp avatar Sep 28 '16 11:09 jschpp

Most uses of grep/ack/ag are line by line searches. You would only have to reach the maximum subject length on multiline searches or if a single line is over 2G. So for most other uses ag would only need to match a single line at a time.

ggl avatar Sep 28 '16 18:09 ggl

You are right in saying most searches are single line only. Nevertheless ag does multi-line searching by default (as far as I know it matches newlines with the \s regex). I found that files greater than 2GB can be searched with a literal (not regex) pattern. In theory ag could make a case-by-case decision and only raise that error in case of a single line greater INT_MAX bytes or multiline searching.

Maybe @ggreer could mention whether he wants this or not. I'm not sure how much work it would be to patch this to support the above case-by-case choice.

jschpp avatar Sep 28 '16 19:09 jschpp

pcre has a new version called pcre2 with backwards incompatible new API. The new API uses size_t instead of int to refer to lengths, so it can handle strings larger than 2GB. Maybe ag should update to require pcre2 instead.

netheril96 avatar Nov 28 '16 07:11 netheril96

@jschpp how do you do literal patterns?

njt1982 avatar Jan 05 '17 17:01 njt1982

@njt1982 From ag --help: -Q --literal Don't parse PATTERN as a regular expression

jschpp avatar Jan 05 '17 17:01 jschpp

I'm hit by this limitation today to grep my 7GB mailbox. It would be really great to handle large files.

monperrus avatar May 12 '17 09:05 monperrus

I hit these errors today when running ag from my home directory and then ag dumped core. Too bad I didn't have ulimit -c unlimited enabled

jeffythedragonslayer avatar Jun 26 '18 21:06 jeffythedragonslayer

Is there maybe a change for a command line argument that would mean something like "process first 2GB of data, then give up"?

dimaqq avatar Apr 26 '23 06:04 dimaqq

You could add a flag to split the file in 2GB parts and then merge the result of every run.

pdelteil avatar Oct 17 '23 16:10 pdelteil