holmes-extractor icon indicating copy to clipboard operation
holmes-extractor copied to clipboard

literature search example dies during indexing

Open PDDeane opened this issue 1 year ago • 2 comments

When I try to run the holmes extractor using this example, it reaches the end of all the parsing, spawns a lot of (?) worker threads during indexing, and then the program gets killed, presumably because it exceeded system resources.

What's going on, and how to fix?

PDDeane avatar Aug 29 '22 18:08 PDDeane

Hi @PDDeane, which literature example are you running and which version of Holmes are you running and on which operating system?

richardpaulhudson avatar Aug 31 '22 17:08 richardpaulhudson

example_search_EN_literature.py holmes_extractor.4.0.3 ubuntu 20.04 LTS

PDDeane avatar Aug 31 '22 21:08 PDDeane

A couple of things you could try:

  • add number_of_workers=1 at line 24. This means only one worker thread will be created. However, on Ubuntu worker threads are forked rather than spawned, so while this should reduce the CPU load it won't have much impact on memory use.
  • define a swap file (https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-20-04/)

richardpaulhudson avatar Sep 06 '22 09:09 richardpaulhudson

Hello,

I encountered the same problem today using Ubuntu 22.04.1 LTS while using the mentioned example or a custom smaller scale example.

  • define a swap file (https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-20-04/)

Now that I have activated a large enough swap partition, I can confirm that the advice above worked for me. :+1:

(On my system, at peak memory usage 32 GB RAM and about 9-10 GB Swap were used for indexing the Harry Potter corpus. When ready for search total memory usage dropped to a bit more than 32 GB.)

schorfma avatar Sep 23 '22 17:09 schorfma