Palmetto icon indicating copy to clipboard operation
Palmetto copied to clipboard

java.lang.OutOfMemoryError: Java heap space I met this error

Open WuDiDaBinGe opened this issue 2 years ago • 4 comments

java.lang.OutOfMemoryError: Java heap space
	at com.carrotsearch.hppc.Internals.newArray(Internals.java:37)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.allocateBuffers(IntObjectOpenHashMap.java:364)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.expandAndPut(IntObjectOpenHashMap.java:318)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.put(IntObjectOpenHashMap.java:194)
	at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestDocumentsWithWord(WindowSupportingLuceneCorpusAdapter.java:124)
	at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestWordPositionsInDocuments(WindowSupportingLuceneCorpusAdapter.java:102)
	at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:54)
	at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:45)
	at org.aksw.palmetto.prob.AbstractProbabilitySupplier.getProbabilities(AbstractProbabilitySupplier.java:37)
	at org.aksw.palmetto.DirectConfirmationBasedCoherence.calculateCoherences(DirectConfirmationBasedCoherence.java:87)
	at org.aksw.palmetto.webapp.PalmettoApplication.calculate(PalmettoApplication.java:198)
	at org.aksw.palmetto.webapp.PalmettoApplication.npmiService(PalmettoApplication.java:111)
	at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
	at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:440)
	at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:428)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:842)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:106)

When i using multi thread to get topic cohrence i met this issue. I ram is 16gb , intel-i9

WuDiDaBinGe avatar Jul 25 '21 09:07 WuDiDaBinGe

In general, this behavior is expected if you try to use many threads that evaluate different topics in parallel.

The problem is that window-based coherence measures need to know the positions of the single words within documents. If you have words that occur often, the program has to handle many positions at the same time. If you do that in parallel with different topics that have different words, it is not very surprising that the program runs out of memory :wink:

It is hard to give you a hint without more information.

  • How do you have parallelized the workflow (i.e., what is the task of a single thread)
  • How many threads do you use?
  • How many topics do you try to evaluate?
  • How many top words does one of your topics have?

MichaelRoeder avatar Jul 26 '21 11:07 MichaelRoeder

In general, this behavior is expected if you try to use many threads that evaluate different topics in parallel.

The problem is that window-based coherence measures need to know the positions of the single words within documents. If you have words that occur often, the program has to handle many positions at the same time. If you do that in parallel with different topics that have different words, it is not very surprising that the program runs out of memory

It is hard to give you a hint without more information.

  • How do you have parallelized the workflow (i.e., what is the task of a single thread)
  • How many threads do you use?
  • How many topics do you try to evaluate?
  • How many top words does one of your topics have?

Thanks for you replying. I use three threads to I use three threads to calculate c_a, c_p and npmi respectively. I send the same data to three threads. The topic number is 100 and each topic has top 10 words to evaluate. Topics_words is a topic-words matrix. In my case, his size is (100,10).

def calculate_coherence(word_list, ret, coherence_type):
    result = []
    for words in word_list:
        result.append(palmetto.get_coherence(words, coherence_type=coherence_type))
    ret[coherence_type] = result
    return
th_ca   = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'ca'], name='th_ca')
th_cp   = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'cp'], name='th_cp')
th_npmi = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'npmi'], name='th_npmi')

I have relieve this problem by running this code "export CATALINA_OPTS="-Xms512m -Xmx3072m -XX:-UseGCOverheadLimit" before "mvn org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:run -Dmaven.tomcat.port=7777" It works useful when topic num is 75. But when topic num is 100, i often met the problrm-- "Aborted (core dumped)"

WuDiDaBinGe avatar Jul 28 '21 14:07 WuDiDaBinGe

Your setup looks good and should work. I am just wondering why you have -Xmx3072m in the options as it limits the server to use not more than 3GB of RAM. You may want to increase it and try it again.

Another workaround would be to split up the list of documents and restart the server in-between. But that is a very bad solution :wink:

We are aware of the problem that the web service sometimes has issues in budgeting its memory. Until now, it is unclear which part of the server creates the problem since the Palmetto library runs without memory issues if it is executed as a plain Java program.

MichaelRoeder avatar Jul 30 '21 10:07 MichaelRoeder

Ok. In will increase "-Xmx" again. I use python-Palmetto,so i don't try Palmetto java library.Maybe i will try next time.Thanks.

WuDiDaBinGe avatar Jul 30 '21 11:07 WuDiDaBinGe