webknossos icon indicating copy to clipboard operation
webknossos copied to clipboard

Local docker setup fails to properly load

Open aaronkanzer opened this issue 9 months ago • 4 comments

Context

When cloning the WebKNOSSOS repository locally, and then running ./start_docker.sh, errors persist in which the system is not usable from a local setting.

This occurred while following the local dev instructions here: https://github.com/scalableminds/webknossos/blob/master/DEV_INSTALL.md#docker

Expected Behavior

The UI would render and allow me to create a dummy user, and proceed to the /dashboard page to begin to view a remote asset.

Current Behavior

I get the initial startup error in my docker logs:

 2024-04-25 15:33:24,222 [ERROR] com.scalableminds.webknossos.datastore.rpc.RPCRequest - Error sending WS request to http://localhost:9000/api/datastores/localhost/datasources (ID: 0): Connection refused: localhost/127.0.0.1:9000
webknossos-1  | play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
webknossos-1  |     play.shaded.ahc.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
webknossos-1  |     java.base/java.lang.Thread.run(Thread.java:1583)

In my case, it seems that the UI is able to load; however, I am unable to 1. register a user, and then 2. proceed as that user to log in to a local dev session

Screenshot 2024-04-26 at 2 51 15 PM

For other users, like @kabilar, the UI would not even load (@kabilar saw the similar error in the logs)

Steps to Reproduce the bug

  • [ ] Cannot reproduce the bug anymore / needs deeper investigation.
  1. Clone WebKnossos and make sure the branch is up to date
  2. For clarity, clean up any docker containers, volumes that could interfere
  3. Run ./start_docker.sh locally
  4. Visit localhost:9000
  5. Try to sign up a user
  6. See that the user cannot continue as logged in

Your Environment for bug

  • Browser name and version: e.g. Chrome 39

Version 124.0.6367.61 (Official Build) (arm64)

  • Operating System and version: e.g. Windows 10

Ventura 13.0 (22A8380) on a Apple M2 Pro machine

  • Version of WEBKNOSSOS (Release or Commit):

Latest commit: 6e1c0e95a3af6ab46cf9f384a758d63d431b7ba4

  • Docker Version: Engine: 24.0.7, Compose: v2.23.3-desktop.2

  • [ ] Specific to long-running jobs (set jobsEnabled=true in application.conf)

  • [ ] Specific to webknossos.org (set isDemoInstance=true in application.conf)

aaronkanzer avatar Apr 26 '24 18:04 aaronkanzer

Hi and thanks for your detailed report! I just tried executing the steps in the readme on my machine again, and it worked for me. localhost:9000 redirects me to http://localhost:9000/onboarding, which is the expected result for the local docker setup.

Here are some things to try:

  • Ensure the necessary ports (9000, 5434) are free: lsof -i:5434,9000 should print nothing (I hope the Mac OS lsof works just as mine)
  • Ensure you pulled the latest webknossos docker image with docker compose pull webknossos
  • Post the full logging output here, unfortunately the stacktrace excerpt you posted above does not hint at the root cause as to why you cannot log in.

Hope this helps!

fm3 avatar Apr 29 '24 08:04 fm3

@fm3 Thanks for the response here -- I'm realizing a potential root of my error -- for some reason I can't fully reset my dev environment, specifically in regards to data that seems to perhaps be blocking my ability to onboard a new org/log in with a pre-existing user.

Within the startup logs:

webknossos-1  | 2024-04-29 15:51:54,164 [INFO] com.scalableminds.webknossos.datastore.services.DataSourceService - Scanning inbox (binaryData)...
webknossos-1  | 2024-04-29 15:51:54,511 [INFO] com.scalableminds.webknossos.datastore.services.DataSourceService - Finished scanning inbox (binaryData): 3 active, 0 inactive. Aaron: [active: [test-dataset test-123 mar-7-test]]

Specifically in those logs: Aaron: [active: [test-dataset test-123 mar-7-test]]

However, I removed/pruned all volumes associated with Docker locally / cleaned out all sources of data (e.g. dropped my Postgres local DB, looked at FossilDB, Redis, etc), also ran --no-cache on build processes, etc. to make sure that it was a fresh build

Do you know where else WebKnossos is storing data locally perhaps? Is the data from my development environment being exported somewhere remotely perhaps?

Please let me know -- thanks again -- Cc @kabilar

aaronkanzer avatar Apr 29 '24 15:04 aaronkanzer

@fm3 Just another follow up here -- tried once again to remove data via docker system prune -af --volumes -- logs still showed populated data:

webknossos-1  | 2024-04-29 15:58:10,922 [INFO] com.scalableminds.webknossos.datastore.services.DataSourceService - Scanning inbox (binaryData)...
webknossos-1  | 2024-04-29 15:58:11,311 [INFO] com.scalableminds.webknossos.datastore.services.DataSourceService - Finished scanning inbox (binaryData): 3 active, 0 inactive. Aaron: [active: [test-dataset test-123 mar-7-test]]
webknossos-1  | 2024-04-29 15:58:11,505 [INFO] com.scalableminds.webknossos.tracingstore.tracings.FossilDBClient - Successfully tested FossilDB health at fossildb:7155. Reply: SERVING
webknossos-1  | 2024-04-29 15:58:11,838 [ERROR] com.scalableminds.webknossos.datastore.rpc.RPCRequest - Error sending WS request to http://localhost:9000/api/datastores/localhost/datasources (ID: 0): Connection refused: localhost/127.0.0.1:9000
webknossos-1  | play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)
webknossos-1  |     play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
webknossos-1  |     play.shaded.ahc.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
webknossos-1  |     play.shaded.ahc.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
webknossos-1  |     play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
webknossos-1  |     java.base/java.lang.Thread.run(Thread.java:1583)
webknossos-1  | 2024-04-29 15:58:11,877 [INFO] Startup - Database schema is up to date.
webknossos-1  | 2024-04-29 15:58:12,018 [INFO] com.zaxxer.hikari.HikariDataSource - slick.db - Starting...
webknossos-1  | 2024-04-29 15:58:12,065 [INFO] com.zaxxer.hikari.HikariDataSource - slick.db - Start completed.
webknossos-1  | 2024-04-29 15:58:13,685 [INFO] models.annotation.AnnotationMutexService - Cleaned up 0 expired annotation mutexes.
webknossos-1  | 2024-04-29 15:58:13,789 [INFO] Startup - No initial data inserted: initialData.notEnabled
webknossos-1  | 2024-04-29 15:58:13,790 [INFO] Startup - Webknossos startup took 7832 ms.
webknossos-1  | 2024-04-29 15:58:15,982 [DEBUG] com.scalableminds.webknossos.datastore.rpc.RPCRequest - Sending WS request to http://localhost:9000/api/datastores/localhost/status (ID: 1). RequestBody: '{"ok":true,"url":"http://localhost:9000","reportUsedStorageEnabled":false}'
webknossos-1  | 2024-04-29 15:58:16,640 [DEBUG] controllers.WKRemoteDataStoreController - Status update from data store 'localhost'. Status: true
webknossos-1  | 2024-04-29 15:59:03,819 [INFO] models.storage.UsedStorageService - Scanning used storage for 0 organizations (List()) in 0 datastores (List())...
webknossos-1  | 2024-04-29 15:59:10,989 [INFO] com.scalableminds.webknossos.datastore.services.DataSourceService - Finished scanning inbox (binaryData): 3 active, 0 inactive
webknossos-1  | 2024-04-29 15:59:11,128 [INFO] controllers.WKRemoteDataStoreController - Received dataset list from datastore 'localhost': 3 active, 0 inactive datasets

Screenshot 2024-04-29 at 12 03 26 PM

Nevertheless, I went to localhost:9000/onboarding directly in the browser -- attempted to make a new org to see if I could make a new user under that org and then proceed -- I get an HTTP 400 Bad Request, with no outputted logs via docker logs -f <docker-container-id>

aaronkanzer avatar Apr 29 '24 16:04 aaronkanzer

The logs showing 3 active dataset come from the datastore module scanning the local filesystem, specifically the (terribly named) binaryData directory, which is mounted in the docker setup, compare https://github.com/scalableminds/webknossos/blob/master/docker-compose.yml#L36

So I would expect that there is still some content in there?

However, as to why you’d get the »You are not allowed to create a new organization« error, I’m a bit stumped right now. The existing subdirectory in the binaryData directory should not be able to interfere with that. I’ll have a closer look at the code in the coming days (I’ll be out of office for a bit, though).

The rest of the logs look healthy (even the RPCRequest Error is presumably harmless, and just an effect of concurrent startup of the modules).

Just to make sure, is there any git diff / changed files?

fm3 avatar Apr 29 '24 16:04 fm3

@fm3 an update! I also cleared out the pg/db volume as well, and was able to get it back in a working state!

My assumption is that between my manual alteration of the data + volumes being mounted that perhaps the data went out of sync somehow, causing the error

I think I am OK to close this Issue, but will let you do the honors in case there is any area of improvement/addition to the dev setup that could be useful? Thanks for the help nonetheless!

aaronkanzer avatar Apr 29 '24 23:04 aaronkanzer

Good to hear! Ok, so I guess a previous attempt might have resulted in a state where an organization was present in the postgres database, but somehow half-complete. So it makes sense that it worked when starting from a clean state.

fm3 avatar May 02 '24 14:05 fm3