morphodict icon indicating copy to clipboard operation
morphodict copied to clipboard

Memory and performance issues

Open fbanados opened this issue 1 year ago • 3 comments

This is really an organization-wide issue, for now keeping it in morphodict.

I've just checked out the RAM consumption of our machines, and it seems that it goes about as follows:

  • speech-db 4GB
  • itwewina.app 11.9 GB
  • non-itwewina sssttt.altlab.dev 4.9GB
  • backend+frontend refactorings 3.6GB
  • semantic-explorer 1.2GB
  • korp portions 0.2GB

This doesn't count itwewina.dev. But when we ramp up a parallel itwewina.dev version with a full dictionary, we run close to the memory limits of our server, which could eventually lead to unexpected service drops. The itwewina.dev service stopped processing requests on its own, likely because of memory constraints, and I've decided to take it down at least until a reboot of the production itwewina (at a time less likely to disrupt others), perhaps even until after CILLDI ends. But most importantly, we need to address the memory issue with urgency. Some suggestions (first one is the most straightforward):

  • [ ] Requesting a machine with more memory for the server from Compute Canada (say, 48GB at minimum, ideally 64GB RAM o more)
  • [ ] Limiting the per-docker container maximum amount of memory and the memory available for uWSGI (python server), perhaps even looking at alternatives to using uWSGI, and ensuring that such restrictions would trigger image restarts that would keep services going instead of randomly stopping processing requests. Note that memory restrictions may not be feasible if all the memory is required for computation instead of caching.
  • [ ] Optimize the apps' use of memory (from profiling, etc.), and document where most of the memory consumption is currently going to.

In parallel, it seems that the generation of sound recordings for paradigms can impose a considerable load on speech-db CPU usage (bursts per-request, which are not an issue currently but would scale to a problem if many people are using them). We should consider separating the service and functionality of providing recordings for dictionary purposes from the validation and recording services provided by speech-db, to avoid the former taking down the latter.

fbanados avatar Jul 17 '24 22:07 fbanados

What we can receive from Digital Resource Alliance of Canada is under the persistent option, cf. https://docs.alliancecan.ca/wiki/Cloud_RAS_Allocations

The available resources have historically increased gradually, though slowly. Anyhow, we should have up to 50GB ram available to us.

aarppe avatar Jul 24 '24 22:07 aarppe

I've reactivated https://itwewina.altlab.dev/, with the latest dictionary and with the updates I worked on so far.

fbanados avatar Jul 29 '24 20:07 fbanados

Restarting VMs as part of upgrading seems to considerably reduce the memory consumption. Better limits should be introduced in uwsgi.

fbanados avatar Aug 16 '24 21:08 fbanados