opengrok icon indicating copy to clipboard operation
opengrok copied to clipboard

Memory usage of ctags increases during reindex

Open gtoph opened this issue 7 years ago • 15 comments

It looks like the ctags memory usage keep growing. Thought I had seen this mentioned before where some end tag wasn't done correctly, and the buffer kept growing when it was scanned. Wonder if something like that is back.?

After about 10 min of indexing, this is what it's up to: ctags

Kept it going for another 5 min and were now upto 75k on a few, and over 110k on a couple other instances. It'll just keep growing though.

gtoph avatar Oct 02 '18 23:10 gtoph

what version of ctags do you use? it should most probably be reported to ctags as an issue with the command line used (could be our regexp trigger a leak? or some internal parser has a leak?)

tarzanek avatar Oct 03 '18 05:10 tarzanek

(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )

tarzanek avatar Oct 03 '18 05:10 tarzanek

Ya, i probably pulled ctags a week or two ago, so should be pretty close to their latest master branch. I can always get the very latest tomorrow.

I honestly dont know if the leak on them or opengrok, but like i say, i know you fixed something like this a while back.

On Tue, Oct 2, 2018, 10:21 PM Lubos Kosco [email protected] wrote:

(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2364#issuecomment-426515012, or mute the thread https://github.com/notifications/unsubscribe-auth/AVXM_aKOT7TH6JVX08T0mpOVpYUx6u2Iks5uhElYgaJpZM4XFBVv .

gtoph avatar Oct 03 '18 05:10 gtoph

I guess one could run the ctags with a tool to track memory leaks (like libumem.so + mdb on Solaris).

vladak avatar Oct 03 '18 08:10 vladak

I am only aware of an unoptimised memory usage in generating history of big git/hg repos with tags and perhaps in xml analyzer (which we will hopefully fix soon)

but based on your picture you show process of ctags and blame that to grow, which is completely out of opengroks control (besides the command line and regexps for newer languages, which might have bugs or trigger the leak)

tarzanek avatar Oct 04 '18 10:10 tarzanek

It would be nice if you could run the ctags binaries under the tools described on https://stackoverflow.com/questions/4593191/memory-profiler-for-c , in particular I'd be interested to see the results from Google's tcmalloc as it can give answer to the why-is-my-process-so-big question.

vladak avatar Oct 04 '18 10:10 vladak

Also, very likely this is dependent on the input files sent to the ctags binaries. Is is possible to tell what are the files leading to increased memory usage ?

vladak avatar Oct 04 '18 10:10 vladak

Recommended allocators with heap profiling:

  • https://github.com/gperftools/gperftools
  • https://github.com/jemalloc/jemalloc

vladak avatar Oct 04 '18 13:10 vladak

Well I'm running on windows, so a lot of the profiling things aren't ported to that. Tried some MS things, but that didn't work on the first attempt so well.

To answer your question, I have a mix of files really. Standard text, python, xml, but 99% are just standard C source files.

gtoph avatar Oct 04 '18 19:10 gtoph

xmls with loooooooong lines? :) (e.g. base64 encoded images?)

tarzanek avatar Oct 05 '18 13:10 tarzanek

The projects i have are ginormous with 100s of people working on various areas, but i would highly doubt any thing like that. Some of them might be several 10s of kb, but still, just standard text keys, nothing overly special.

Saw a few utilities searching around like dr memory.. ill see if i can get those working and if they have any useful info.

gtoph avatar Oct 05 '18 14:10 gtoph

I believe at least one of the allocators I mentioned also works on Windows.

vladak avatar Oct 05 '18 14:10 vladak

Not sure if this is related, but I've faced a similar problem when simply indexing Google Chrome's codebase on a 512 MB VPS.

Here's an example of a single 540 KB file that will cause ctags resident set size to spike to ~145 MB: https://github.com/hunspell/hunspell/blob/master/src/hunspell/utf_info.hxx

Here's a way I've used to measure RSS with GNU version of time(1) command:

export TIME="max RSS: %M"
/usr/bin/time ctags -o - utf_info.cxx >/dev/null

edigaryev avatar Nov 04 '18 21:11 edigaryev

With the fix applied, ctags only uses ~9 MB to parse utf_info.hxx.

But there still exists huge 14 MB header files like this one that cause ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

edigaryev avatar Nov 05 '18 08:11 edigaryev

But there still exists huge 14 MB header files like this one that causes ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

Maybe it is related to Cork. Tag entries created by CPreProcessor parser are stored to the memory until the parser reaches at the end of the current input file. When the parser reaches EOF, ctags writes the entries on the memory to tags file. See http://docs.ctags.io/en/latest/internal.html?highlight=cork#output-tag-stream about the cork.

masatake avatar Nov 05 '18 13:11 masatake