securedrop.org
securedrop.org copied to clipboard
Discourse forum scanning causing high memory usage
Over the weekend, noticed high memory alerts on the securedrop.org webserver—over 80% of RAM utilized. After a bit of poking around, it appears that the thrice daily scans of the Discourse forum and documentation base are hanging. Cleaned them up, but subsequent runs via cron triggered the memory alerts, as well.
As a temporary measure, I disabled the cron job, to avoid performance problems over the weekend. Now that I'm back at a keyboard, I'll re-enable the cron job for once daily, and watch for problems.
If even a once daily scan generates problems again, we'll need to take a hard look at the update indices management command and try to refactor it, e.g. to use generators rather than lists. Better logging on the performance would be grand, as well.
I've been unable to get discourse scanning to complete on my local machine and Conor reports that last time he checked it was sometimes taking days for that job to complete. We might want to bump up priority of finding an alternative means of indexing Discourse. Here's my summary of where we're at and possible solutions:
We currently use Discourse’s REST-ish API, which is… tricky. It’s designed in a way that requires multiple requests to get a list of topics and then multiple requests to index each topic. We also rebuild the index from scratch every time since we have no way to diff which topics were modified/deleted/added to since the last sync. Here are some things we could do instead:
- Directly pull post content out of the DB. This could be either a remote connection or a periodic export—which doesn’t necessarily have to be a complete export. We could write a script just to export the data we need or just export specific tables.
- Discourse supports webhooks for events like topic modification/new posts. We could use these to implement granular index updating. The problem here is that we will need to make sure we have an initial up-to-date import before we start relying on just webhooks, which we could do with a method from 1 or with the existing API logic. We also have to contend with the possibility that webhooks will be lost en route, so we should also have an approach to doing that full import periodically anyway.
- If Discourse has decent plugin capability we could write our own API endpoint that provides the data we need in a more efficient format.
These are currently failing on cron as far as I can tell. I try to manually run it and I get the following error:
(securedrop-alpha)gcorn@www-sd:~$ /var/www/django-alpha/manage.py update_discourse_index --rebuild
Traceback (most recent call last):
File "/var/www/django-alpha/manage.py", line 12, in <module>
execute_from_command_line(sys.argv)
File "/home/gcorn/securedrop-alpha/lib/python3.4/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
utility.execute()
File "/home/gcorn/securedrop-alpha/lib/python3.4/site-packages/django/core/management/__init__.py", line 356, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/gcorn/securedrop-alpha/lib/python3.4/site-packages/django/core/management/base.py", line 283, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/gcorn/securedrop-alpha/lib/python3.4/site-packages/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
File "/usr/lib/python3.4/contextlib.py", line 30, in inner
return func(*args, **kwds)
File "/var/www/django-alpha/search/management/commands/update_discourse_index.py", line 25, in handle
all_results = index_all_topics()
File "/usr/lib/python3.4/contextlib.py", line 30, in inner
return func(*args, **kwds)
File "/var/www/django-alpha/search/utils/discourse/__init__.py", line 64, in index_all_topics
'search_content': '\n'.join(searchable_content),
TypeError: sequence item 1: expected str instance, NoneType found
cooooooooooooooooool
also that's weird
I wonder if it's because we didn't update the discourse URL, so it's still pinging the old server?
We definitely did update the URL, but I agree the behavior looks like the network call failed.
Forum is no longer in use and this can be closed :)