cht-core
cht-core copied to clipboard
Remove need for couchdb db per user
Describe the performance issue
CHT creates one db per user in pouch replicating to couch to store metadata like telemetry, feedback, and read status. This is problematic with national scale projects as this will create in the order of 100k databases. telemetry and feedback docs are replication and then periodically copied into the medic-users-meta
db and then wiped. The read docs are replicated and kept as a record of which docs the user has seen across devices.
Describe the improvement you'd like
Remove the requirement for one db per user to reduce server load.
For telemetry and feedback this could be to change the way the docs are replicated so that instead of native replication this is a bespoke API that writes directly into the medic-users-meta db thus eliminating the need for the periodic cleanup.
For read docs this is more difficult but we should investigate storing these in a shared db. Alternatively we could just store them on the local device which means if you got a new device the data would be lost but I don't think anyone would even notice. We could also consider removing read docs altogether as they only apply to reports and messages which aren't widely displayed but this would need wider consultation.
Describe alternatives you've considered
It might be possible to do something clever with db partitions, especially for the read docs, where there is one db organised into per-user partitions.
When asked if this will be an issue, the official update from CouchDB is...
No concern generally. The main issue is how many of these are being accessed at any one time. Which is presumably way lower than the total. That is too to say to make max_dbs_open larger than your max. You also want to keep things that multiply per-db and cost resources to a minimum. In particular unless a single DB is bigger than 1-10Gb, set q=1. And also keep only only design doc per db (modulo a second one for transparent index updates).
So this may not be the limit I worried it was.
We need to keep these settings in mind: https://docs.couchdb.org/en/stable/maintenance/performance.html#system-resource-limits
We probably need to update this: https://github.com/medic/cht-core/blob/1bfc16c07a3aa63839d972dd05fe23537c3901ae/couchdb/10-docker-default.ini#L11
And also document setting the ulimit on the host system.
I'm doubtful anyone realistically uses read docs. I suppose we could check training material to see if we ever indicate the red bubbles are important.
Alternatively we could just store them on the local device which means if you got a new device the data would be lost but I don't think anyone would even notice.
I think this is the best choice. I think when the user logs in for the first time, we should assume that all docs are read (there's no point in marking them as unread!), and only start counting new documents.
I'm doubtful anyone realistically uses read docs.
There are a couple of projects that I know of (CHV-NEO and a few I-TECH projects) that probably use the "Read" state of Messages on the Messages Page. Also, NSSD might be using them on the Reports Page (they have a role that is supposed to review submitted reports from the Reports Page and the "Read" state was the main way of knowing which ones they reviewed already).