codeowners icon indicating copy to clipboard operation
codeowners copied to clipboard

Huge memory consumption when running on large repository with many codeowners rules

Open adammichalik opened this issue 4 years ago • 9 comments

I have a large repository, containing ~500 000 files (in a large directory tree). Running codeowners -u on it requires allowing around 8 GB RAM to NodeJS (--max_old_space_size=8000) and I see it increasing as the scan progresses. It's an issue both for running locally an in an automatic pipeline.

I didn't expect filesystem traversal to require linearly increasing memory usage. Is there a reason for it?

adammichalik avatar Feb 18 '21 16:02 adammichalik

should be a simple fix!

On Thu, Feb 18 2021 at 08:07, Adam Michalik [email protected] wrote:

I have a large repository, containing ~500 000 files (in a large directory tree). Running codeowners -u on it requires allowing around 8 GB RAM to NodeJS (--max_old_space_size=8000) and I see it increasing as the scan progresses. It's an issue both for running locally an in an automatic pipeline.

I didn't expect filesystem traversal to require linearly increasing memory usage. Is there a reason for it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/beaugunderson/codeowners/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCX2RRUI6LZQGOEZR34LS7U3LZANCNFSM4X2SD6ZA .

beaugunderson avatar Feb 18 '21 16:02 beaugunderson

@adammichalik will require a quick rewrite to change this from "collect, filter, map" to a streaming API but I think that's a better match here; please ping me again if I lose track of this :)

beaugunderson avatar Feb 18 '21 16:02 beaugunderson

released in 5.0.1 btw; let me know if it's an improvement :)

On Thu, Feb 18, 2021 at 8:54 AM, Beau Gunderson [email protected] wrote:

@adammichalik https://github.com/adammichalik will require a quick rewrite to change this from "collect, filter, map" to a streaming API but I think that's a better match here; please ping me again if I lose track of this :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/beaugunderson/codeowners/issues/18#issuecomment-781485966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCX7QW75XKW5KFI5FXNLS7VA5NANCNFSM4X2SD6ZA .

beaugunderson avatar Feb 24 '21 16:02 beaugunderson

I tried 5.0.1, but it blows up on an out of memory error just the same. I will try to create a good testing repository sometime next week to support this case.

adammichalik avatar Feb 24 '21 20:02 adammichalik

Hmm, I tested this against a directory tree of 500,000 files...

On Wed, Feb 24, 2021 at 12:46 PM, Adam Michalik [email protected] wrote:

I tried 5.0.1, but it blows up on an out of memory error just the same. I will try to create a good testing repository sometime next week to support this case.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/beaugunderson/codeowners/issues/18#issuecomment-785367758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCX6AC5O5A4U6FZLVDFTTAVQRNANCNFSM4X2SD6ZA .

beaugunderson avatar Feb 24 '21 20:02 beaugunderson

@adammichalik OK, try 5.1.0; tried on a directory structure of 7.3 million files and never got above 2gb of consumption; it's also quite a bit faster as I switched to a stream approach (and that 2gb of consumption is the high-water mark for buffered stream contents, I think... I've addressed any possible memory leaks)

beaugunderson avatar Feb 25 '21 00:02 beaugunderson

I did more digging and the issue seems to stem from the combination of the size of the CODEOWNERS file and the repo itself. If I place an empty CODEOWNERS file in it, the -u command runs blazingly fast and with no issues. With a CODEOWNERS file with 1571 rules, as in my real project, I get an out of memory error. See sample file: CODEOWNERS.zip

adammichalik avatar Feb 25 '21 15:02 adammichalik

Ahhh OK, I had not tested with a file with quite so many rules in it; that gives me another avenue to explore :)

On Thu, Feb 25, 2021 at 7:57 AM, Adam Michalik [email protected] wrote:

I did more digging and the issue seems to stem from the combination of the size of the CODEOWNERS file and the repo itself. If I place an empty CODEOWNERS file in it, the -u command runs blazingly fast and with no issues. With a CODEOWNERS file with 1571 rules, as in my real project, I get an out of memory error. See sample file: CODEOWNERS.zip https://github.com/beaugunderson/codeowners/files/6044159/CODEOWNERS.zip

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/beaugunderson/codeowners/issues/18#issuecomment-786007066, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCXY2LOVDWVEOXREZURDTAZXQFANCNFSM4X2SD6ZA .

beaugunderson avatar Feb 25 '21 20:02 beaugunderson

I've run into the same issue. We have a monolith in one of our repos and we are using the CODEOWNERS file to claim ownership across 60 teams. Any progress on a fix for this? Any temporary workarounds you could suggest? Thanks!

UPDATE: Updating node to v17.0.1 solved the problem for me without adjusting memory settings.

donbeattie avatar Nov 03 '21 21:11 donbeattie