cht-core icon indicating copy to clipboard operation
cht-core copied to clipboard

Don't hard code COUCHDB_SECRET in docker compose files

Open mrjones-plip opened this issue 2 years ago • 3 comments

Describe the issue We hard code the secret for CouchDB in the Arch v3 docker setup as shown here:

"COUCHDB_SECRET=${COUCHDB_SECRET:-6c1953b6-e64d-4b0c-9268-2528396f2f58}"

This is insecure as it is public and will be used by default unless users override it.

Describe the improvement you'd like We should dynamically generate this at install time or mandate that users specify a unique one per install.

Describe alternatives you've considered NA

mrjones-plip avatar Sep 15 '22 21:09 mrjones-plip

cc @garethbowen per our call today

mrjones-plip avatar Sep 15 '22 21:09 mrjones-plip

I think we should not default the UUID too.

garethbowen avatar Sep 15 '22 22:09 garethbowen

I had a look at this in the 7812-require-password branch but ran out of time to get the build to pass. I think there's some issue with making the entire cluster use the same secret and UUID...

garethbowen avatar Sep 23 '22 02:09 garethbowen

This is ready for AT on 7800-no-couch-secret.

Please make sure that:

  • it works when using single node CouchDb
  • works when using clustered CouchDb
  • for both, your sessions are persistent on container restart (not remove, just restart).
  • for both, your checkpointer are persistent on container restart (track that offline users don't download all docs again if you restart CouchDb).

Compose files:

dianabarsan avatar Oct 04 '22 03:10 dianabarsan

Thanks @dianabarsan for the steps and the files to test.

Here are the testing results using the files provided in the previous comment and the branch 7800-no-couch-secret

Using single node CouchDB

  • The instance was up and running with no issues.
  • The instance was persistent when the couchdb container was restarted.
Video attached

video

  • I had problems when I tried to log in using an offline user. Not sure if I am missing something
Online user

image

Offline user

image

Video attached

video

Using clustered CouchDB

  • Had an error when I tried to run the docker-compose up
Error attached
cht-api         | RequestError: Error: getaddrinfo ENOTFOUND haproxy
cht-api         |     at new RequestError (/api/node_modules/request-promise-core/lib/errors.js:14:15)
cht-api         |     at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:87:29)
cht-api         |     at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31)
cht-api         |     at self.callback (/api/node_modules/request/request.js:185:22)
cht-api         |     at Request.emit (node:events:527:28)
cht-api         |     at Request.onRequestError (/api/node_modules/request/request.js:877:8)
cht-api         |     at ClientRequest.emit (node:events:527:28)
cht-api         |     at Socket.socketErrorListener (node:_http_client:454:9)
cht-api         |     at Socket.emit (node:events:527:28)
cht-api         |     at emitErrorNT (node:internal/streams/destroy:157:8) {
cht-api         |   cause: Error: getaddrinfo ENOTFOUND haproxy
cht-api         |       at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) {
cht-api         |     errno: -3008,
cht-api         |     code: 'ENOTFOUND',
cht-api         |     syscall: 'getaddrinfo',
cht-api         |     hostname: 'haproxy'
cht-api         |   },
cht-api         |   error: Error: getaddrinfo ENOTFOUND haproxy
cht-api         |       at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) {
cht-api         |     errno: -3008,
cht-api         |     code: 'ENOTFOUND',
cht-api         |     syscall: 'getaddrinfo',
cht-api         |     hostname: 'haproxy'
cht-api         |   }
cht-api         | }
cht-sentinel    | RequestError: Error: getaddrinfo ENOTFOUND haproxy
cht-sentinel    |     at new RequestError (/sentinel/node_modules/request-promise-core/lib/errors.js:14:15)
cht-sentinel    |     at Request.plumbing.callback (/sentinel/node_modules/request-promise-core/lib/plumbing.js:87:29)
cht-sentinel    |     at Request.RP$callback [as _callback] (/sentinel/node_modules/request-promise-core/lib/plumbing.js:46:31)
cht-sentinel    |     at self.callback (/sentinel/node_modules/request/request.js:185:22)
cht-sentinel    |     at Request.emit (node:events:527:28)
cht-sentinel    |     at Request.onRequestError (/sentinel/node_modules/request/request.js:877:8)
cht-sentinel    |     at ClientRequest.emit (node:events:527:28)
cht-sentinel    |     at Socket.socketErrorListener (node:_http_client:454:9)
cht-sentinel    |     at Socket.emit (node:events:527:28)
cht-sentinel    |     at emitErrorNT (node:internal/streams/destroy:157:8) {
cht-sentinel    |   cause: Error: getaddrinfo ENOTFOUND haproxy
cht-sentinel    |       at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) {
cht-sentinel    |     errno: -3008,
cht-sentinel    |     code: 'ENOTFOUND',
cht-sentinel    |     syscall: 'getaddrinfo',
cht-sentinel    |     hostname: 'haproxy'
cht-sentinel    |   },
cht-sentinel    |   error: Error: getaddrinfo ENOTFOUND haproxy
cht-sentinel    |       at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) {
cht-sentinel    |     errno: -3008,
cht-sentinel    |     code: 'ENOTFOUND',
cht-sentinel    |     syscall: 'getaddrinfo',
cht-sentinel    |     hostname: 'haproxy'
cht-sentinel    |   }
cht-sentinel    | }

tatilepizs avatar Oct 04 '22 16:10 tatilepizs

On the error, it looks like the haproxy container failed to come up. Can you check the logs? It can be something as simple as a port clash or something else.

dianabarsan avatar Oct 04 '22 16:10 dianabarsan

I had problems when I tried to log in using an offline user. Not sure if I am missing something

From the video, it looks like your browser doesn't accept the self signed certificate and doesn't download the service worker, which is required for offline users. How do you usually handle self signed certificates?

dianabarsan avatar Oct 04 '22 16:10 dianabarsan

About the certificate problems, I have never had this issue before. I was reading about it, so I tried using Firefox, exported the certificates and added them to the keychanin access to be trusted, but that did not work for chrome, don't understand why because it is working fine in Firefox, will need to investigate a little bit more, but meanwhile, I was testing that the offline users didn't download all docs again when I restarted CouchDb container.

Video attached

video

tatilepizs avatar Oct 04 '22 20:10 tatilepizs

offline users didn't download

Since your user only has 37 docs, then you would not notice them downloading in a sync unless you inspected the network requests, and check how many docs the server sends back.

dianabarsan avatar Oct 04 '22 21:10 dianabarsan

About the error when I try to use the clustered CouchDB..

This is the error that it is showing the `cht-haproxy`
backend couchdb-servers
  balance leastconn
  retry-on all-retryable-errors
  log global
  retries 5
  # servers are added at runtime, in entrypoint.sh, based on couchdb
  server couchdb couchdb:5984 check agent-check agent-inter 5s agent-addr healthcheck agent-port 5555
[alert] 276/204913 (1) : parseBasic loaded
[alert] 276/204913 (1) : parseCookie loaded
[alert] 276/204913 (1) : replacePassword loaded
[NOTICE] 276/204913 (1) : haproxy version is 2.3.19-0647791
[NOTICE] 276/204913 (1) : path to executable is /usr/local/sbin/haproxy
[ALERT] 276/204913 (1) : parsing [/usr/local/etc/haproxy/backend.cfg:7] : 'server couchdb' : could not resolve address 'couchdb'.
[ALERT] 276/204913 (1) : Failed to initialize server(s) addr.

I don't have a lot of knowledge with docker so I just try changing the name of the COUCHDB_SERVERS in the docker-compose_cht-core.yml from couchdb to couchdb.1/couchdb.2/couchdb.3 just to see what happened.

Using couchdb.1 the result was that the container that failed this time was the cht-api with the error:

Error
2022-10-04 20:55:13 INFO: Translations loaded successfully 
2022-10-04 20:55:14 INFO: Running installation checks… 
2022-10-04 20:55:14 INFO: Medic API listening on port 5988 
2022-10-04 20:55:14 ERROR: Fatal error initialising medic-api 
2022-10-04 20:55:14 ERROR: FetchError: invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0
    at /api/node_modules/node-fetch/lib/index.js:272:32
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  message: 'invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0',
  type: 'invalid-json',
  [stack]: 'FetchError: invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0\n' +
    '    at /api/node_modules/node-fetch/lib/index.js:272:32\n' +
    '    at processTicksAndRejections (node:internal/process/task_queues:96:5)',
  name: 'FetchError'
} 

Using couchdb.2 or couchdb.3 all the containers were up successfully but I am seeing this error:

Error
cht-sentinel    | StatusCodeError: 503 - "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n"
cht-sentinel    |     at new StatusCodeError (/sentinel/node_modules/request-promise-core/lib/errors.js:32:15)
cht-sentinel    |     at Request.plumbing.callback (/sentinel/node_modules/request-promise-core/lib/plumbing.js:104:33)
cht-sentinel    |     at Request.RP$callback [as _callback] (/sentinel/node_modules/request-promise-core/lib/plumbing.js:46:31)
cht-sentinel    |     at Request.self.callback (/sentinel/node_modules/request/request.js:185:22)
cht-sentinel    |     at Request.emit (node:events:527:28)
cht-sentinel    |     at Request.<anonymous> (/sentinel/node_modules/request/request.js:1154:10)
cht-sentinel    |     at Request.emit (node:events:527:28)
cht-sentinel    |     at IncomingMessage.<anonymous> (/sentinel/node_modules/request/request.js:1076:12)
cht-sentinel    |     at Object.onceWrapper (node:events:641:28)
cht-sentinel    |     at IncomingMessage.emit (node:events:539:35) {
cht-sentinel    |   statusCode: 503,
cht-sentinel    |   error: '<html><body><h1>503 Service Unavailable</h1>\n' +
cht-sentinel    |     'No server is available to handle this request.\n' +
cht-sentinel    |     '</body></html>\n'
cht-sentinel    | }
cht-haproxy     | <150>Oct  4 21:37:00 haproxy[27]: 172.21.0.8,<NOSRV>,503,0,1,0,GET,/,-,admin,'-',222,-1,-,'-'
cht-api         | StatusCodeError: 503 - "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n"
cht-api         |     at new StatusCodeError (/api/node_modules/request-promise-core/lib/errors.js:32:15)
cht-api         |     at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:104:33)
cht-api         |     at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31)
cht-api         |     at Request.self.callback (/api/node_modules/request/request.js:185:22)
cht-api         |     at Request.emit (node:events:527:28)
cht-api         |     at Request.<anonymous> (/api/node_modules/request/request.js:1154:10)
cht-api         |     at Request.emit (node:events:527:28)
cht-api         |     at IncomingMessage.<anonymous> (/api/node_modules/request/request.js:1076:12)
cht-api         |     at Object.onceWrapper (node:events:641:28)
cht-api         |     at IncomingMessage.emit (node:events:539:35) {
cht-api         |   statusCode: 503,
cht-api         |   error: '<html><body><h1>503 Service Unavailable</h1>\n' +
cht-api         |     'No server is available to handle this request.\n' +
cht-api         |     '</body></html>\n'
cht-api         | }

Not sure if this helps you or not, I just wanted to try different things 🙂

tatilepizs avatar Oct 04 '22 21:10 tatilepizs

...you would not notice them downloading in a sync unless you inspected the network requests, and check how many docs the server sends back.

Thanks for pointing that @dianabarsan I think this video is better, isn't it?

Video

video

tatilepizs avatar Oct 04 '22 22:10 tatilepizs

Unfortunately no :( Pouch <-> Couch replication is optimized to not download a document if it already exists locally (and this check is made via the _revs_diff) call. In your case, you should inspect the response of the changes requests after you restart the container (the one that doesn't fail). There should be no changes there at all (or 1-2 docs that were updated in the meantime). Another option is to check that the since parameter is never rolled back, so you would look at every /medic/_changes request and check the since parameter, which should never go back to 0.

When checking, please be aware that there will be a _changes request for the users meta database. Checking that can also be used to verify, but then please be sure you manually sync once before restarting the container - the meta database doesn't automatically sync on startup.

dianabarsan avatar Oct 05 '22 05:10 dianabarsan

I don't have a lot of knowledge with docker so I just try changing the name of the COUCHDB_SERVERS in the docker-compose_cht-core.yml from couchdb to couchdb.1/couchdb.2/couchdb.3 just to see what happened.

Looking into how core-eng/sre architected this, the readme specifies you were right! You do need to set them. But seperate them with ,, not / ;) (note, readme is wrong! We want to use a single COUCHDB_SERVERS, not discrete COUCHDB1_SERVER etc - I'll open another PR to fix this tomorrow)

I was able to use these steps to test with clustered couch on this branch:

  1. Download cht-core and clustered couch from this branch
  2. call compose up with: COUCHDB_SERVERS="couchdb.1,couchdb.2,couchdb.3" COUCHDB_PASSWORD=password COUCHDB_USER=medic docker-compose -f docker-compose_cht-couchdb-clustered.yml -f docker-compose_cht-core.yml up

mrjones-plip avatar Oct 06 '22 05:10 mrjones-plip

Thank you @dianabarsan and @mrjones-plip for your help.

I think that I have tested everything correctly this time, here are the results:

Using single node CouchDB

  • The instance was up and running with no issues.
  • The instance was persistent when the couchdb container was restarted.
  • Using the offline user the session persisted after the couchdb container was restarted and the since parameter never goes back to 0
Video attached

video

Using clustered CouchDB

  • Using the instructions from @mrjones-plip in the previous comment I was able to get the instance up and running with no issues.
  • The instance was persistent when the couchdb container was restarted.
  • And same thing using the offline user the session persisted after the couchdb.1, couchdb.2 and couchdb.3 containers were restarted and the since parameter never goes back to 0
Video attached

video

@dianabarsan please let me know is there is anything else that I am missing and should test, and thanks again, I learned a lot from this ticket.

tatilepizs avatar Oct 06 '22 16:10 tatilepizs

Excellent testing, thank you so much @tatilepizs !

dianabarsan avatar Oct 06 '22 18:10 dianabarsan

Merged to master

dianabarsan avatar Oct 11 '22 04:10 dianabarsan