manifold 97% Memory Usage and Segmentation Fault after multiple Re-Ingests of 300pages Publication

Running Manifold 3.0 on Digital Ocean Droplet with Ubuntu 19.04. LTS.

After getting close to our first publication of a 300 pages text we uploaded different versions of the text to the project and had frequent re-ingestions of versions.

At the same time the memory usage of the server went up to 97% without going back in times of no interaction. Then today, the first Error 500 issues appeared when trying to delete a version or to show a text in the reader. Another reason for why this has not come to my attention before is the fact that before I had to shutdown the server when I was not working with it due to unsolved GDPR issues. Now, is the first time it ran for more than a (working) week.

When trying to evaluate via the shell what causes the memory load an the 500 issues and issued manifold-ctl tail I received a Segmentation fault:

/usr/bin/manifold-ctl: line 21: 26951 Segmentation fault      (core dumped) /opt/manifold/embedded/bin/omnibus-ctl manifold /opt/manifold/embedded/service/omnibus-ctl $@

When I restart the server and Manifold I am around 49% memory usage. However, any action I carry out on the UI consumes more memory after memory runs full again. At no time, memory is freed again. I evaluated this with htop, but here is also a 1-day selection from the usage statistics of Digital Ocean.

I basically would have to restart the server every 1-2 days. This is the workload 2 persons doing the aforementioned tasks. However, we plan to get public within the next 2-3 weeks which would significantly increase workload.

Aug 01 '19 11:08 Cutuchiqueno

How much memory do you have on the VM?

Aug 01 '19 12:08 zdavis

8GB

Aug 01 '19 18:08 Cutuchiqueno

@Cutuchiqueno Does this continue to be a problem?

Feb 12 '20 21:02 zdavis

No, it doesn't. Thx for the support.

Feb 13 '20 08:02 Cutuchiqueno

I have to correct myself. Memory consumption remained an ongoing problem until today. Besides the killing of ruby related processes due to lack of memory (see fig. 1 below) we now also experienced the complete freeze of the server due to memory issues.

grafik

(Fig. 1)

All these issues happen in synchronization with the jobs that are scheduled with job scheduler of Manifold. Figure 2, for instance, shows the moment in which the server killed one of those Ruby processes. It is exactly in line with the workload that is created by the scheduled jobs.

grafik

(Fig. 2)

All this happens on our operational Manifold instance. Though, I did not find concrete evidence we connect also some data inconsistency issues around annotation data with this problem.

While the original problem was the result of a 300pages upload, the last year showed that it is just the result of time. Normally, these problems appear again 2-3 weeks after rebooting the server, regardless of workload intensity.

Dec 21 '20 10:12 Cutuchiqueno

Could you please add swap to the host and see if it mitigates the problem?

On Mon, Dec 21, 2020 at 2:49 AM Niels-Oliver Walkowski < [email protected]> wrote:

Reopened #2309 https://github.com/ManifoldScholar/manifold/issues/2309.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ManifoldScholar/manifold/issues/2309#event-4135428629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB6PJMIKH7GEJGWV56KXMLSV4R5DANCNFSM4IIPYT4Q .

Dec 21 '20 16:12 zdavis

We've spent quite a bit of time looking for memory leaks in Manifold over the last few months, and have yet to find the cause. We recently put some application performance monitoring in place on instances we control, and are waiting to see what we catch there. When we're back from the break, next week, I'll follow up with you and we'll put some monitoring on your instance so we can figure out what leads to the memory ballooning. I understand that you think the problem is related to scheduled jobs—it could be, but that's not our sense, since when we do see memory usage increasing, it's always tied to the main puma process and not to the zhong or sidekiq processes that handle scheduled and background jobs.

Dec 21 '20 16:12 zdavis

I can confirm the dependency of this issue from puma. In the end, I figured that it is always one of the puma cluster worker processes that grows limitless over time. See output of htop:

Thus, while there are many puma cluster worker processes, it seems always be the one that relates to manifold api which accumulates RAM usage over time.

Feb 09 '21 14:02 Cutuchiqueno

Stale issue. Closing. Please feel free to reopen if it continues to be a problem.

Nov 17 '22 21:11 zdavis

manifold manifold copied to clipboard

97% Memory Usage and Segmentation Fault after multiple Re-Ingests of 300pages Publication

manifold
manifold copied to clipboard