matano icon indicating copy to clipboard operation
matano copied to clipboard

large file sizes causing OOMKills and timeouts

Open timcosta opened this issue 1 year ago • 4 comments

hi all! i'm investigating using matano for some log ingestion, and some of the ALB log files i'm looking at are extremely large - 100MB compressed, multiple GB decompressed. we're running into resource exhaustion issues for memory usage, even after manually adjusting limits in the console to the maximum of 10240MB of memory. this happens in multiple lambdas, most notably the transform and writer.

the specific issues we're seeing in the writer are basically that it logs INFO lake_writer: Starting 25 downloads from S3 and then 20s later it's killed by lambda for exceeding 10240 MB of memory used. can this 25 number be tuned or tweaked to take into account size?

the transformer and databatcher issues we were able to resolve by increasing the timeout and memory, which should be covered by https://github.com/matanolabs/matano/issues/85 when it's included. i may be able to contribute this depending on how our discovery goes, but not sure how long it would be until that could happen.

from the investigation i've done into this problem for a custom processing solution, that "best" resolutions appear to be either loading the data and processing it as a stream rather than loading it all into memory at once, or have some sort of pre-processor that splits large files into smaller chunks before they get to the loader.

do y'all have any thoughts on the best path forward here, or if matano would ever consider handling situations like this where the inputs/batches cannot be processed due to size?

timcosta avatar Jun 05 '23 19:06 timcosta