holmes-extractor
holmes-extractor copied to clipboard
literature search example dies during indexing
When I try to run the holmes extractor using this example, it reaches the end of all the parsing, spawns a lot of (?) worker threads during indexing, and then the program gets killed, presumably because it exceeded system resources.
What's going on, and how to fix?
Hi @PDDeane, which literature example are you running and which version of Holmes are you running and on which operating system?
example_search_EN_literature.py holmes_extractor.4.0.3 ubuntu 20.04 LTS
A couple of things you could try:
- add
number_of_workers=1
at line 24. This means only one worker thread will be created. However, on Ubuntu worker threads are forked rather than spawned, so while this should reduce the CPU load it won't have much impact on memory use. - define a swap file (https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-20-04/)
Hello,
I encountered the same problem today using Ubuntu 22.04.1 LTS while using the mentioned example or a custom smaller scale example.
- define a swap file (https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-20-04/)
Now that I have activated a large enough swap partition, I can confirm that the advice above worked for me. :+1:
(On my system, at peak memory usage 32 GB RAM and about 9-10 GB Swap were used for indexing the Harry Potter corpus. When ready for search total memory usage dropped to a bit more than 32 GB.)