immich icon indicating copy to clipboard operation
immich copied to clipboard

[BUG] Repair page not accessible

Open TheZoker opened this issue 2 years ago • 38 comments

The bug

When I try to access the /admin/repair endpoint, I get this error:

image

In the logs it looks like this:

2023/10/17 09:18:54 [error] 45#45: *708 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.3.102, server: , request: "GET /admin/repair/__data.json?x-sveltekit-invalidated=01 HTTP/1.1", upstream: "http://172.18.0.3:3000/admin/repair/__data.json?x-sveltekit-invalidated=01", host: "<removed>", referrer: "https://<removed>/admin/jobs-status"
2023/10/17 09:18:58 [error] 46#46: *723 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.3.102, server: , request: "GET /admin/repair HTTP/1.1", upstream: "http://172.18.0.3:3000/admin/repair", host: "photos.zkr.io", referrer: "https://<removed>/admin/jobs-status"

The OS that Immich Server is running on

Proxmox Alpine Based LXC with docker

Version of Immich Server

v1.82.0

Version of Immich Mobile App

v1.82.0

Platform with the issue

  • [ ] Server
  • [X] Web
  • [ ] Mobile

Your docker-compose.yml content

version: '3.8'
services:
  immich-server:
    image: ghcr.io/immich-app/immich-server:release
    container_name: immich_server
    restart: always
    command: [ "start.sh", "immich" ]
    volumes:
      - /shared/nas/immich:/usr/src/app/upload
      - /shared/nas/photos:/mnt/media/nas:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense

  immich-microservices:
    image: ghcr.io/immich-app/immich-server:release
    container_name: immich_microservices
    restart: always
    command: [ "start.sh", "microservices" ]
    volumes:
      - /shared/nas/immich:/usr/src/app/upload
      - /shared/nas/photos:/mnt/media/nas:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    environment:
#      - LOG_LEVEL=verbose
      - TZ=Europe/Berlin

  immich-machine-learning:
    image: ghcr.io/immich-app/immich-machine-learning:release
    container_name: immich_machine_learning
    restart: always
    volumes:
      - model-cache:/cache
    env_file:
      - .env

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:release
    restart: always
    env_file:
      - .env

  typesense:
    image: typesense/typesense:0.24.1
    container_name: immich_typesense
    restart: always
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
      # remove this to get debug messages
      - GLOG_minloglevel=1
    volumes:
      - tsdata:/data

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine
    restart: always

  database:
    image: postgres:14-alpine
    container_name: immich_postgres
    restart: always
    volumes:
      - ~/files/postgres:/var/lib/postgresql/data
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}

  immich-proxy:
    image: ghcr.io/immich-app/immich-proxy:release
    container_name: immich_proxy
    restart: always
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 80:8080
    depends_on:
      - immich-server
      - immich-web

volumes:
  model-cache:
  tsdata:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secrets for postgres and typesense. You should change these to random passwords
TYPESENSE_API_KEY=<removed>
DB_PASSWORD=<removed>

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

1. Update to v1.82.0
2. Go to repair page

Additional information

No response

TheZoker avatar Oct 17 '23 09:10 TheZoker

I also have the exact same issue

traktuner avatar Oct 17 '23 09:10 traktuner

Same here. I smell some hotfix in the air 🚀

Pheggas avatar Oct 17 '23 15:10 Pheggas

me too

nodis avatar Oct 17 '23 16:10 nodis

Can confirm getting the same issue

locqust avatar Oct 17 '23 21:10 locqust

Just for the devs to consider - @alextran1502 mentioned that there will be a patch and some UI improvements. What I observed is that when I navigate to the repair tab, Immich starts to compare the filesystem and database, which makes the system unresponsive for some time. I get that error but the system still does not stop with its operations even when I navigate to an other tab. Maybe it can be optimized that the compare operations stop when somebody navigates away from the repair tab.

traktuner avatar Oct 18 '23 05:10 traktuner

Could the repair report be similar to other jobs where it needs to be run before looking at the results?

andrewdunndev avatar Oct 18 '23 14:10 andrewdunndev

@andrewgdunn Yeah we are planning that for the fix/enhancement

alextran1502 avatar Oct 18 '23 14:10 alextran1502

@alextran1502 Even after 1.82.1 it's still not fixed. It's even worse as whole Immich just crash and stack restart is needed.

Pheggas avatar Oct 19 '23 14:10 Pheggas

@Pheggas we haven't implemented the fix for this yet. When it is, it will be mentioned in the release note

alextran1502 avatar Oct 19 '23 14:10 alextran1502

@alextran1502 Even after 1.82.1 it's still not fixed. It's even worse as whole Immich just crash and stack restart is needed.

that particular issue was not part of 1.82.1 according to the relase notes.

traktuner avatar Oct 19 '23 14:10 traktuner

Ah, sorry for that.

Pheggas avatar Oct 19 '23 14:10 Pheggas

I just updated to v1.82.1 and I have the same issue when accessing the repair page.

SitramSoft avatar Oct 26 '23 06:10 SitramSoft

I just updated to v1.82.1 and I have the same issue when accessing the repair page.

Again (as already stated two comments above yours) a fix for this issue is not included in the update 1.82.1

TheZoker avatar Oct 26 '23 08:10 TheZoker

I'm on version 1.83.0 and still have the issue as well.

Please read my comment 4 posts above yours (and some others also mention this). The fix is still not mentioned in the release notes, so it's not part of the 1.83.0 release.

traktuner avatar Oct 29 '23 05:10 traktuner

Still present in v1.85.0 but its normal it not yet fix ;)

yodatak avatar Nov 08 '23 16:11 yodatak

The issue will be closed once it is fixed

alextran1502 avatar Nov 08 '23 17:11 alextran1502

With the new container structure in v1.88.0 the repair page is loaded successfully, you just have to wait for it (for several minutes) to load :)

LinhyCZ avatar Nov 22 '23 09:11 LinhyCZ

With the new container structure in v1.88.0 the repair page is loaded successfully, you just have to wait for it (for several minutes) to load :)

I still have this issue also on 1.88.2. The loading bar at the top is there, no timeout but also after 30 minutes no repair page. The stats of the docker container show activity for may minutes but then go to idle.

traktuner avatar Nov 22 '23 10:11 traktuner

I was able to open "Repair" page but only using local LAN IP. just waited for couple of minutes (cirka 5/10min) I saw some errors in server container but page loaded.

image

[Nest] 7  - 11/22/2023, 10:28:36 AM   ERROR [ExceptionsHandler] Connection terminated due to connection timeout
Error: Connection terminated due to connection timeout
    at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
    at Object.onceWrapper (node:events:628:28)
    at Connection.emit (node:events:514:28)
    at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:63:12)
    at Socket.emit (node:events:514:28)
    at TCP.<anonymous> (node:net:337:12)
[Nest] 7  - 11/22/2023, 10:28:36 AM   ERROR [ExceptionsHandler] Connection terminated due to connection timeout
Error: Connection terminated due to connection timeout
    at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
    at Object.onceWrapper (node:events:628:28)
    at Connection.emit (node:events:514:28)
    at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:63:12)
    at Socket.emit (node:events:514:28)
    at TCP.<anonymous> (node:net:337:12)

Using Repair page with nginx proxy manager redirection returns error as before

paszczaq avatar Nov 22 '23 10:11 paszczaq

Nginx and other proxies will still enforce a timeout, but directly hitting the server won't since it is not configured with any.

jrasm91 avatar Nov 23 '23 04:11 jrasm91

Nginx and other proxies will still enforce a timeout, but directly hitting the server won't since it is not configured with any.

I saw that too. Of course there should not be a timeout anymore but I still can't view the page even after multiple tries and waiting 30 minutes. I saw that the different containers "do" something cpu wise for many minutes but then they stop and go to cpu idle. 250k pictures on the server all on enterprise SSDs so that should not be a big problem performance wise. I hope this can be fixed with a job that pre-generates the report in the background and then displays it on the repair page.

traktuner avatar Nov 23 '23 09:11 traktuner

I think we will eventually move to a (background) report, yes. But for now at least some people can view it 😅

jrasm91 avatar Nov 23 '23 16:11 jrasm91

Nginx and other proxies will still enforce a timeout, but directly hitting the server won't since it is not configured with any.

you can add the following to nginx config to increase the timeouts

  keepalive_timeout 1d;                                                                                                                                                                                       
  send_timeout 1d;                                                                                                                                                                                            
  client_body_timeout 1d;                                                                                                                                                                                     
  client_header_timeout 1d;                                                                                                                                                                                   
  proxy_connect_timeout 1d;                                                                                                                                                                                   
  proxy_read_timeout 1d;                                                                                                                                                                                      
  proxy_send_timeout 1d;                                                                                                                                                                                      ```

alex-007 avatar Nov 23 '23 20:11 alex-007

For me, repair page crashes Chrome tab after 15 minutes of waiting. I tried to use --max_old_space_size=16000 --js-flags="--max-old-space-size=16000" to start chrome, but that didn't help.

INovozhilov avatar Nov 25 '23 21:11 INovozhilov

I am running v1.89 as a Unraid Docker. I do have the same error. I can not go back to immich again. Do I need to restart the docker?

canedje avatar Dec 10 '23 09:12 canedje

I also see a timeout error when accessing the repair page:

immich_server            | [Nest] 7  - 12/14/2023, 6:39:32 PM   ERROR [ExceptionsHandler] Connection terminated due to connection timeout
immich_server            | Error: Connection terminated due to connection timeout
immich_server            |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich_server            |     at Object.onceWrapper (node:events:628:28)
immich_server            |     at Connection.emit (node:events:514:28)
immich_server            |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:63:12)
immich_server            |     at Socket.emit (node:events:514:28)
immich_server            |     at TCP.<anonymous> (node:net:337:12)

Reloading the page does not help. But i can go back to the immich homepage and go to the admin panel to repair again. But i have the same issue then.

Sometimes it works. Then i see that there are a lot of untracked files, mostly thumbnails and encoded videos, which might be untracked because i move some files in my external library around. I would like to clean this up anyhow. But the repair all button is greyed out.

Strubbl avatar Dec 14 '23 18:12 Strubbl

Is there a way to run this report via the command line?

andrewdunndev avatar Jan 08 '24 11:01 andrewdunndev

No, but maybe I'll work on fixing it soon (tm), just for you :smile:

jrasm91 avatar Jan 08 '24 16:01 jrasm91

I have around 333k photos and for me the repair takes 5 minutes to run server-side and then the browser crashes. At some point once the server-side is done loading, memory use on the browser explodes from almost nothing to several GB within seconds and then the page gets killed. I've tried Safari, Edge and Firefox. Because of this, I have no way of executing the "repair all" function.

hachre avatar Jan 11 '24 20:01 hachre

Same issue here. Can't access the page and it crashes the server.

jocxfin avatar Mar 02 '24 00:03 jocxfin