code-maat icon indicating copy to clipboard operation
code-maat copied to clipboard

OOM when processing really large git logs

Open rjayasinghe opened this issue 10 years ago • 8 comments
trafficstars

Hi!

I tried to process a pretty large git log from a private git repo. I increased to max heap to 4GB but it still did not help. Much more heap would not go as my laptop's memory is limited.

Best Regards, Robin

rjayasinghe avatar Sep 17 '15 13:09 rjayasinghe

Hi @rjayasinghe

I've analyzed fairly rich Git repositories (e.g. Rails with 10 years history, Mono with +10 years) and Code Maat's memory usage stays around 1.3 GB on those. I think your issue has to do with some pattern in your input data combined with some inefficiency in the analysis algorithms.

What analysis did you run?

Would it be possible for you to send me the git log? That would allow me to debug it. In the meantime I'd recommend that you use a shorter analysis time span until I've addressed the real problem.

adamtornhill avatar Sep 17 '15 13:09 adamtornhill

Hi!

Sorry, I cannot share the git log. It's built from a +10GB repository with ~15 years of history.

This is how I called code-maat:

java -Xmx4g -jar code-maat-0.9.2-SNAPSHOT-standalone.jar -l 

I know it's not very helpful if I cannot share the git log - but I at least wanted to share that your analysis algorithms run into problems when analyzing really large data sets..

Best Regards, Robin

rjayasinghe avatar Sep 17 '15 13:09 rjayasinghe

Alright, no problem. I will see if I can find some even larger open-source project where I can reproduce the problem.

Did any of the analyses work? For example, try -a identity. That would help me to isolate the potential problem.

adamtornhill avatar Sep 17 '15 13:09 adamtornhill

-a identity resulted in OOM as well:

WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
Exception in thread "main" java.lang.OutOfMemoryError
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:536)
        at java.util.concurrent.ForkJoinTask.reportResult(ForkJoinTask.java:596)
        at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:640)
        at java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:1521)
        at clojure.core.reducers$fjinvoke.invoke(reducers.clj:49)
        at clojure.core.reducers$foldvec.invoke(reducers.clj:341)
        at clojure.core.reducers$fn__1915.invoke(reducers.clj:362)
        at clojure.core.reducers$fn__1798$G__1793__1809.invoke(reducers.clj:81)
        at clojure.core.reducers$fold.invoke(reducers.clj:98)
        at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:139)
        at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:158)
        at code_maat.parsers.git2$parse_log.invoke(git2.clj:74)
        at code_maat.app.app$git2__GT_modifications$fn__9421.invoke(app.clj:133)
        at code_maat.app.app$run_parser_in_error_handling_context.invoke(app.clj:97)
        at code_maat.app.app$git2__GT_modifications.invoke(app.clj:132)
        at code_maat.app.app$parse_commits_to_dataset.invoke(app.clj:202)
        at code_maat.app.app$run.invoke(app.clj:215)
        at code_maat.cmd_line$_main.doInvoke(cmd_line.clj:66)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at code_maat.cmd_line.main(Unknown Source)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at clojure.lang.PersistentHashMap.cloneAndSet(PersistentHashMap.java:1169)
        at clojure.lang.PersistentHashMap.access$000(PersistentHashMap.java:28)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:418)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
        at clojure.lang.RT.assoc(RT.java:778)
        at clojure.core$assoc__4142.invoke(core.clj:191)
        at clojure.lang.Atom.swap(Atom.java:65)
        at clojure.core$swap_BANG_.invoke(core.clj:2240)
        at instaparse.gll$node_get.invoke(gll.clj:286)
        at instaparse.gll$push_listener.invoke(gll.clj:339)
        at instaparse.gll$non_terminal_parse.invoke(gll.clj:818)
        at instaparse.gll$_parse.invoke(gll.clj:119)
        at instaparse.gll$push_listener$fn__1307.invoke(gll.clj:348)
        at instaparse.gll$step.invoke(gll.clj:409)
        at instaparse.gll$run.invoke(gll.clj:427)
        at instaparse.gll$run.invoke(gll.clj:413)
        at instaparse.gll$parse.invoke(gll.clj:894)
        at instaparse.core$parse.doInvoke(core.clj:91)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:27)
        at clojure.core$partial$fn__4527.invoke(core.clj:2493)
        at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:40)
        at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:47)
        at code_maat.parsers.hiccup_based_parser$parse_from$fn__1950.invoke(hiccup_based_parser.clj:144)
        at clojure.core.protocols$iter_reduce.invoke(protocols.clj:49)
        at clojure.core.protocols$fn__6510.invoke(protocols.clj:112)
        at clojure.core.protocols$fn__6452$G__6447__6465.invoke(protocols.clj:13)
        at clojure.core.reducers$reduce.invoke(reducers.clj:79)
        at clojure.core.reducers$foldvec.invoke(reducers.clj:335)
        at clojure.core.reducers$foldvec$fc__1904$fn__1905.invoke(reducers.clj:340)
        at clojure.core.reducers$foldvec$fn__1908.invoke(reducers.clj:345)

rjayasinghe avatar Sep 17 '15 14:09 rjayasinghe

Thanks for the info, @rjayasinghe ! I've tested the last released version of Code Maat, 0.9.1, on a large repository and it seems to be able to handle it. If you have the possibility, please try version 0.9.1 (available here and let me know if that solves your problem; We did some parallelization in the parsing stage of 0.9.2 and it might have introduced the problem (but I'm not sure yet).

adamtornhill avatar Sep 18 '15 15:09 adamtornhill

OK. I downloaded and built 0.9.1 from github. This time it ran longer. However, after ~1,5 hours the process died with

WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at clojure.lang.PersistentHashMap.cloneAndSet(PersistentHashMap.java:1169)
        at clojure.lang.PersistentHashMap.access$000(PersistentHashMap.java:28)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:414)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:415)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:415)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
        at clojure.lang.RT.assoc(RT.java:778)
        at clojure.core$assoc__4142.invoke(core.clj:191)
        at clojure.lang.Atom.swap(Atom.java:65)
        at clojure.core$swap_BANG_.invoke(core.clj:2240)
        at instaparse.gll$node_get.invoke(gll.clj:286)
        at instaparse.gll$push_listener.invoke(gll.clj:339)
        at instaparse.gll$CatListener$fn__1340.invoke(gll.clj:487)
        at instaparse.gll$push_message$f__1269.invoke(gll.clj:238)
        at instaparse.gll$step.invoke(gll.clj:409)
        at instaparse.gll$run.invoke(gll.clj:427)
        at instaparse.gll$run.invoke(gll.clj:413)
        at instaparse.gll$parse.invoke(gll.clj:894)
        at instaparse.core$parse.doInvoke(core.clj:91)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:26)
        at clojure.core$partial$fn__4527.invoke(core.clj:2493)
        at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:47)
        at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:55)
        at code_maat.parsers.hiccup_based_parser$extend_when_complete.invoke(hiccup_based_parser.clj:62)
        at code_maat.parsers.hiccup_based_parser$as_entry_tokens.invoke(hiccup_based_parser.clj:82)
        at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:158)
        at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:172)
        at code_maat.parsers.git2$parse_log.invoke(git2.clj:74)
        at code_maat.app.app$git2__GT_modifications$fn__9279.invoke(app.clj:133)
        at code_maat.app.app$run_parser_in_error_handling_context.invoke(app.clj:97)

Best Regards, Robin

rjayasinghe avatar Sep 21 '15 14:09 rjayasinghe

@rjayasinghe @adamtornhill How about tuning GC. -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing https://groups.google.com/forum/#!topic/clojure/yPaQN7JuKFY http://nyeggen.com/post/2012-04-16-tuning-jvm-gc-for-a-big/

janisz avatar Apr 11 '16 18:04 janisz

My heap space runs out of memory for wikimedia/mediawiki

The evo-log file, produced as described in the book, has 23MB.

Setting up the JVM heap size in the .bat-file does not fix this problem:

java -Xmx512M -Xms64M -jar t\winmaat0.8.5\code-maat-0.8.5-standalone.jar -l ../mediawiki/maat_evo.log -c git -a summary Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at clojure.lang.PersistentVector.rangedIterator(PersistentVector.java:238) at clojure.lang.PersistentVector.iterator(PersistentVector.java:261) at clojure.lang.Murmur3.hashOrdered(Murmur3.java:105) at clojure.lang.APersistentVector.hasheq(APersistentVector.java:166) at clojure.lang.Util.dohasheq(Util.java:177) at clojure.lang.Util.hasheq(Util.java:168) at clojure.lang.PersistentHashMap.hash(PersistentHashMap.java:120) at clojure.lang.PersistentHashMap.valAt(PersistentHashMap.java:152) at clojure.lang.RT.get(RT.java:672) at instaparse.gll$push_message.invoke(gll.clj:172) at instaparse.gll$push_result.invoke(gll.clj:255) at instaparse.gll$NodeListener$fn__588.invoke(gll.clj:374) at instaparse.gll$push_message$f__524.invoke(gll.clj:173) at instaparse.gll$step.invoke(gll.clj:328) at instaparse.gll$run.invoke(gll.clj:344) at instaparse.gll$run.invoke(gll.clj:332) at instaparse.gll$parse.invoke(gll.clj:758) at instaparse.core$parse.doInvoke(core.clj:83) at clojure.lang.RestFn.invoke(RestFn.java:425) at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:26) at clojure.lang.AFn.applyToHelper(AFn.java:156) at clojure.lang.AFn.applyTo(AFn.java:144) at clojure.core$apply.invoke(core.clj:626) at clojure.core$partial$fn__4228.doInvoke(core.clj:2468) at clojure.lang.RestFn.invoke(RestFn.java:408) at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:48) at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:54) at code_maat.parsers.hiccup_based_parser$extend_when_complete.invoke(hiccup_based_parser.clj:62) at code_maat.parsers.hiccup_based_parser$as_entry_tokens.invoke(hiccup_based_parser.clj:82) at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:157) at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:172) at code_maat.parsers.git$parse_log.invoke(git.clj:62)

This is running the version downloaded from https://www.adamtornhill.com/code/crimescenetools.htm.

Meffi42 avatar Sep 13 '18 11:09 Meffi42