huly-selfhost icon indicating copy to clipboard operation
huly-selfhost copied to clipboard

account service unable to connect to database

Open mcquenji opened this issue 3 months ago • 23 comments

Hi, when wanting to try out Huly for me and my team i followed the steps in the tutorial and did not really change anything (except for adding the print & love service - and setting up OIDC). However i am unable to log in nor sign up as i get a bad gateway error from the /_account service. After inspecting some logs i found out the account container is not able to connect to cockroach.

Error while initializing postgres account db PostgresError: password authentication failed for user selfhost
    at ErrorResponse (/usr/src/app/bundle.js:67331:31)
    at handle (/usr/src/app/bundle.js:67088:11)
    at Socket.data (/usr/src/app/bundle.js:66897:13)
    at Socket.emit (node:events:518:28)
    at addChunk (node:internal/streams/readable:561:12)
    at readableAddChunkPushByteMode (node:internal/streams/readable:512:3)
    at Readable.push (node:internal/streams/readable:392:5)
    at TCP.onStreamRead (node:internal/stream_base_commons:189:23) {
  severity_local: 'ERROR',
  severity: 'ERROR',
  code: '28P01',
  file: 'auth.go',
  line: '395',
  routine: 'NewErrPasswordUserAuthFailed'
}

mcquenji avatar Sep 25 '25 20:09 mcquenji

Hello, friend. Try running the db container separately from the other containers. Check the db container logs and make sure the selfhost user was automatically created. Then run all the other containers. If that doesn't work, try checking the db credentials: hostname, password, db.

ivan19911502 avatar Sep 26 '25 04:09 ivan19911502

Hello @mcquenji, you may observe this error while the cockroach is still initializing after start or had some issues with initialization where it should be creating this user. Please check the cockroachdb container logs and if there's no error try waiting longer after restarting everything. Try increasing the amount of resources available to Docker.

lexiv0re avatar Sep 26 '25 12:09 lexiv0re

Thanks for your response!

Okay after inspecting the cockroachdb logs I've gotten the following error:

server startup failed: failed to start server: problem using security settings: validating node cert: key file certs/node.key has permissions -rwxrwxrwx, exceeds -rwxr-----
Failed running "start-single-node"

I'm using default docker named volumes. Is there a way to change these permissions or maybe just tell cockroachdb to allow insecure certs?

mcquenji avatar Sep 27 '25 15:09 mcquenji

Were your certificates created by the cockroachdb automatically or you set them up manually? When created automatically they are created with the correct permissions. If they do not match with your permissions try adjusting them manually.

Image

lexiv0re avatar Sep 27 '25 17:09 lexiv0re

thanks for your help!

turns out i had a crontab running changing some permissions for docker volumes (required for another stack running on this vm). after narrowing down that crontab to only touch the volumes required, the account service works!

however, now i am unable to login via SSO - i just get redirected to the login page with the account service giving the following logs:

INF try auth via | provider=openid timestamp=2025-09-28T04:56:20.534Z 

no errors no nothing...

mcquenji avatar Sep 28 '25 04:09 mcquenji

I'd recommend looking into the browser's log of network requests during the auth. Here's how to do it in Chrome:

  1. Open the dev tools
  2. On the Network tab tick Preserve log
  3. Do the auth flow
  4. Inspect the requests carefully

lexiv0re avatar Sep 28 '25 17:09 lexiv0re

I've inspected the network logs and even curled my idp manually - and everything seems fine. Tried swapping the IdP to selfhosted gitlab instead of zitadel, with no results. I've also tried setting the DISABLE_SIGNUP env explicitly to false.

What type of response does huly expect from an IdP? Does it require a JWT token with the user info inside or does it expect a bearer token and uses that to hit the userinfo endpoint?

mcquenji avatar Sep 28 '25 19:09 mcquenji

is there a way to have more verbose logs?

mcquenji avatar Sep 28 '25 19:09 mcquenji

It expects the user information when redirected back to the app. From the logs you are seeing looks like you are hitting failureRedirect here: https://github.com/hcengineering/platform/blob/develop/pods/authProviders/src/openid.ts#L84. Otherwise, I'd expect to see logs from https://github.com/hcengineering/platform/blob/develop/pods/authProviders/src/openid.ts#L93 which starts with Provider auth handler log right away.

lexiv0re avatar Sep 29 '25 05:09 lexiv0re

after reading the readme again i noticed that the redirect url is supposed to hit the accounts service:

Use {huly_account_svc}/auth/openid/callback as the sign-in redirect URI. The huly_account_svc is the hostname for the account service of the deployment, which should be accessible externally from the client/browser side. In the provided example setup, the account service runs on port 3000.

After that I modified the compose.yml file and added _accounts to the ACCOUNTS_URL environment variable for the accounts service:

ACCOUNTS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_accounts
                                               ^^^^^^^^^^
                                               This was missing

This would also explain why I did not get any logs whatsoever after being redirected.

Now login works. Thanks for your quick responses and for this amazing project!

mcquenji avatar Sep 29 '25 21:09 mcquenji

After logging in and entering a Workspace name im stuck in a loading screen the logs of the workspace service indicate success tho:

2025-09-29T22:36:46.030059899Z send force close job="b8f56113" workspace={"uuid":"b7285aef-39d4-4277-add4-84157e3ccf97","url":"<workspace name>","dataId":null} transactorUrl="ws://transactor:3333"
2025-09-29T22:36:46.080572409Z send transactor event force-close to https://huly.<domain>/_transactor
2025-09-29T22:36:46.198519353Z upgrade workspace job="b8f56113" event="upgrade-done" value=100
2025-09-29T22:36:47.070495379Z ---CREATE-DONE--------- job="b8f56113" workspace="b7285aef-39d4-4277-add4-84157e3ccf97" version={"major":0,"minor":7,"patch":235} region="" time=436981
Image

I've been stuck in this screen for close to 30mins now

mcquenji avatar Sep 29 '25 23:09 mcquenji

Could this error in the transactor service be releated?

2025-09-29T23:10:54.379585077Z ERR unexpected error in websocket | err={"message":"Connection timeout","name":"KafkaJSNumberOfRetriesExceeded","retriable":false,"retryCount":5,"retryTime":6042,"stack":"KafkaJSNonRetriableError\n  Caused by: KafkaJSConnectionError: Connection timeout\n    at Timeout.onTimeout [as _onTimeout] (/usr/src/app/bundle.js:128550:27)\n    at listOnTimeout (node:internal/timers:594:17)\n    at process.processTimers (node:internal/timers:529:7)"} timestamp=2025-09-29T23:10:54.379Z 
2025-09-29T23:10:54.379688027Z {
2025-09-29T23:10:54.379694716Z   message: 'Failed to process session operation',
2025-09-29T23:10:54.379698306Z   err: KafkaJSNonRetriableError
2025-09-29T23:10:54.379701326Z     Caused by: KafkaJSConnectionError: Connection timeout
2025-09-29T23:10:54.379704346Z       at Timeout.onTimeout [as _onTimeout] (/usr/src/app/bundle.js:128550:27)
2025-09-29T23:10:54.379707056Z       at listOnTimeout (node:internal/timers:594:17)
2025-09-29T23:10:54.379709726Z       at process.processTimers (node:internal/timers:529:7) {
2025-09-29T23:10:54.379712376Z     name: 'KafkaJSNumberOfRetriesExceeded',
2025-09-29T23:10:54.379714956Z     retriable: false,
2025-09-29T23:10:54.379717535Z     helpUrl: undefined,
2025-09-29T23:10:54.379720095Z     retryCount: 5,
2025-09-29T23:10:54.379722635Z     retryTime: 6042,
2025-09-29T23:10:54.379725206Z     [cause]: KafkaJSConnectionError: Connection timeout
2025-09-29T23:10:54.379727846Z         at Timeout.onTimeout [as _onTimeout] (/usr/src/app/bundle.js:128550:27)
2025-09-29T23:10:54.379730526Z         at listOnTimeout (node:internal/timers:594:17)
2025-09-29T23:10:54.379733435Z         at process.processTimers (node:internal/timers:529:7) {
2025-09-29T23:10:54.379736115Z       retriable: true,
2025-09-29T23:10:54.379738685Z       helpUrl: undefined,
2025-09-29T23:10:54.379741285Z       broker: 'redpanda:9092',
2025-09-29T23:10:54.379754316Z       code: undefined,
2025-09-29T23:10:54.379760235Z       [cause]: undefined
2025-09-29T23:10:54.379763615Z     }
2025-09-29T23:10:54.379766366Z   }
2025-09-29T23:10:54.379768975Z }

If so can you point me in the right direction to fixing it?

mcquenji avatar Sep 29 '25 23:09 mcquenji

after restarting the stack multiple times and removing the redpanda healthcheck everything works now. not sure why tho....

mcquenji avatar Sep 30 '25 03:09 mcquenji

@mcquenji yes, when Redpanda is not operational the transactor wouldn't respond so making sure the Redpanda is healthy and then restarting the transactor usually helps with these kind of issues. Please note that in the provided configuration the Redpanda and CockroachDB are not really production-ready. Please refer to their production deployment recommendations. You can find the links in the last paragraph of this section https://github.com/hcengineering/huly-selfhost?tab=readme-ov-file#clone-the-huly-selfhost-repository-and-configure-nginx

lexiv0re avatar Sep 30 '25 05:09 lexiv0re

A few last things i need help with:

  1. The print service returns 500 Internal Server Error with the following logs:
Printing http://localhost:8087/guest/<workspace>?token=<token>to pdf with viewport {"width":1440,"height":900}
net::ERR_CONNECTION_REFUSED http://localhost:8087/guest/<workspace>?token=<token>
Error: net::ERR_CONNECTION_REFUSED at http://localhost:8087/guest/<workspace>?token=<token>
    at navigate (/usr/src/app/bundle.js:530067:39)
    at async Deferred.race (/usr/src/app/bundle.js:134491:18)
    at async CdpFrame.goto (/usr/src/app/bundle.js:530037:19)
    at async _CdpPage.goto (/usr/src/app/bundle.js:140461:18)
    at async print (/usr/src/app/bundle.js:540023:3)
    at async /usr/src/app/bundle.js:540201:24
    at async handleRequest (/usr/src/app/bundle.js:540147:5)

I'm not sure as to why it tries to connect to localhost instead of the FQN. I also noticed that some invite links generated contain localhost instead of the FQN

  1. I am unable to start a meeting. It just keeps loading and there are no logs in the love container. Could this be linked to the MONGO_URL, as there is no such container?
Image

Compose file

  love:
    image: hardcoreeng/love:${HULY_VERSION}
    container_name: love
    ports:
      - 8096:8096
    environment:
      - STORAGE_CONFIG=minio|minio?accessKey=minioadmin&secretKey=minioadmin
      - SECRET=${SECRET}
      - DB_URL=${CR_DB_URL}
      - ACCOUNTS_URL=http://account:3000
      - MONGO_URL=mongodb://mongodb:27017
      - STORAGE_PROVIDER_NAME=minio
      - PORT=8096
      - LIVEKIT_HOST=wss://<project>.livekit.cloud
      - LIVEKIT_API_KEY=<apikey>
      - LIVEKIT_API_SECRET=<apisecret>
    restart: unless-stopped
    networks:
      - huly_net

  print:
    image: hardcoreeng/print:${HULY_VERSION}
    container_name: print
    ports:
      - 4005:4005
    environment:
      - STORAGE_CONFIG=minio|minio?accessKey=minioadmin&secretKey=minioadmin
      - STATS_URL=http://stats:4900
      - SECRET=${SECRET}
      - ACCOUNTS_URL=http://account:3000
    restart: unless-stopped
    networks:
      - huly_net

mcquenji avatar Sep 30 '25 11:09 mcquenji

  1. It's coming from - FRONT_URL=http://localhost:8087 in transactor service - needs to be updated there

lexiv0re avatar Sep 30 '25 18:09 lexiv0re

  1. No, Mongo should no longer be needed for love

lexiv0re avatar Sep 30 '25 19:09 lexiv0re

  1. It's coming from - FRONT_URL=http://localhost:8087 in transactor service - needs to be updated there

should i set the url to the internal url or to the publicly accessible url?

mcquenji avatar Sep 30 '25 19:09 mcquenji

It should be publicly accessible URL

lexiv0re avatar Sep 30 '25 19:09 lexiv0re

now im getting a pdf containing this error message

ChunkLoadError: Loading chunk 60921 failed. (error: https://huly.<domain>/60921.3af569f92afe4ffdef6e.js) --
ChunkLoadError at En.f.j (https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10778:99662) at
https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10778:73067 at Array.reduce (<anonymous>) at En.e
(https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10778:73032) at
https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10724:37336 at
https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10724:23698 at loadPlugin
(https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10775:2722) at getResource
(https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10775:3032) at
https://huly.<domain>/bundle.6d20f7f6392e3fb055d1.js:10729:78983 at Array.map (<anonymous>)

mcquenji avatar Sep 30 '25 20:09 mcquenji

  1. No, Mongo should no longer be needed for love

any other config i might be missing?

mcquenji avatar Sep 30 '25 21:09 mcquenji

ChunkLoadError: Loading chunk XXX failed. is usually seen when there's a caching issue but I don't see how it might be happening here. Is there anything in the print pod logs? Could you check the network tab in the dev tools when you try to print - what requests are there and what are their statuses?

lexiv0re avatar Oct 01 '25 20:10 lexiv0re

@BykhovDenis any ideas on what might be wrong with the love service or what can be further checked?

lexiv0re avatar Oct 01 '25 20:10 lexiv0re