opengrok
opengrok copied to clipboard
Memory usage of ctags increases during reindex
It looks like the ctags memory usage keep growing. Thought I had seen this mentioned before where some end tag wasn't done correctly, and the buffer kept growing when it was scanned. Wonder if something like that is back.?
After about 10 min of indexing, this is what it's up to:

Kept it going for another 5 min and were now upto 75k on a few, and over 110k on a couple other instances. It'll just keep growing though.
what version of ctags do you use? it should most probably be reported to ctags as an issue with the command line used (could be our regexp trigger a leak? or some internal parser has a leak?)
(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )
Ya, i probably pulled ctags a week or two ago, so should be pretty close to their latest master branch. I can always get the very latest tomorrow.
I honestly dont know if the leak on them or opengrok, but like i say, i know you fixed something like this a while back.
On Tue, Oct 2, 2018, 10:21 PM Lubos Kosco [email protected] wrote:
(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2364#issuecomment-426515012, or mute the thread https://github.com/notifications/unsubscribe-auth/AVXM_aKOT7TH6JVX08T0mpOVpYUx6u2Iks5uhElYgaJpZM4XFBVv .
I guess one could run the ctags with a tool to track memory leaks (like libumem.so + mdb on Solaris).
I am only aware of an unoptimised memory usage in generating history of big git/hg repos with tags and perhaps in xml analyzer (which we will hopefully fix soon)
but based on your picture you show process of ctags and blame that to grow, which is completely out of opengroks control (besides the command line and regexps for newer languages, which might have bugs or trigger the leak)
It would be nice if you could run the ctags binaries under the tools described on https://stackoverflow.com/questions/4593191/memory-profiler-for-c , in particular I'd be interested to see the results from Google's tcmalloc as it can give answer to the why-is-my-process-so-big question.
Also, very likely this is dependent on the input files sent to the ctags binaries. Is is possible to tell what are the files leading to increased memory usage ?
Recommended allocators with heap profiling:
- https://github.com/gperftools/gperftools
- https://github.com/jemalloc/jemalloc
Well I'm running on windows, so a lot of the profiling things aren't ported to that. Tried some MS things, but that didn't work on the first attempt so well.
To answer your question, I have a mix of files really. Standard text, python, xml, but 99% are just standard C source files.
xmls with loooooooong lines? :) (e.g. base64 encoded images?)
The projects i have are ginormous with 100s of people working on various areas, but i would highly doubt any thing like that. Some of them might be several 10s of kb, but still, just standard text keys, nothing overly special.
Saw a few utilities searching around like dr memory.. ill see if i can get those working and if they have any useful info.
I believe at least one of the allocators I mentioned also works on Windows.
Not sure if this is related, but I've faced a similar problem when simply indexing Google Chrome's codebase on a 512 MB VPS.
Here's an example of a single 540 KB file that will cause ctags resident set size to spike to ~145 MB: https://github.com/hunspell/hunspell/blob/master/src/hunspell/utf_info.hxx
Here's a way I've used to measure RSS with GNU version of time(1) command:
export TIME="max RSS: %M"
/usr/bin/time ctags -o - utf_info.cxx >/dev/null
With the fix applied, ctags only uses ~9 MB to parse utf_info.hxx.
But there still exists huge 14 MB header files like this one that cause ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h
But there still exists huge 14 MB header files like this one that causes ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h
Maybe it is related to Cork. Tag entries created by CPreProcessor parser are stored to the memory until the parser reaches at the end of the current input file. When the parser reaches EOF, ctags writes the entries on the memory to tags file. See http://docs.ctags.io/en/latest/internal.html?highlight=cork#output-tag-stream about the cork.