bgrep icon indicating copy to clipboard operation
bgrep copied to clipboard

Memory allocation

Open aquac opened this issue 5 years ago • 2 comments

The current version of the code seems to try to load the whole file into memory which fails for large files. E.g. in my case I get the following error: memory allocation of 5404574964 bytes failed

It would be great if the code would read the file sequentially / piece by piece.

aquac avatar Dec 18 '20 07:12 aquac

A temporary solution to this and maybe also a nice new feature would to have an option that allows to parse only the first X bytes of a file and check those regarding the regex.

aquac avatar Dec 18 '20 07:12 aquac

The traditional grep doesn't have this exact issue because the regexes are limited to lines, so it reads one line of the file at a time. With bgrep, we shouldn't have such restriction, because the binary pattern might have the equivalent of a line break character, even when not representing actual textual line breaks. If I recall correctly, the Regex crate has no support for arbitrary buffered reading/matching, and that's why bgrep reads the entire file into memory. I believe the only feasible alternative for very large files is using memory maps (I know ripgrep can do this), and I'm willing to support implementing such feature, even though it would be non-trivial, probably requiring some unsafe code.

For handling only the first X bytes of a file, one can combine the head command and bgrep with a pipe:

head -c X my-large-file.bin | bgrep ...

gahag avatar Dec 18 '20 19:12 gahag