immich icon indicating copy to clipboard operation
immich copied to clipboard

Spurious face recognition (random photos of random things and people recognized as one face)

Open eacunha opened this issue 1 year ago • 15 comments

The bug

Immich recognized about 200 photos from completely different things, people, landscapes, foods, game screenshots (completely unrelated photos) as being one person in face recognition. See attached images below, it makes it very obvious to understand the issue.

image

And here are some of the photos that were associated with this "person":

image

As can be easily seen, the first is a dog with black background, the second is a screenshot from Genshin Impact, the third is a group of people on a grass, the forth is a completely unrelated group of people on the snow, the fifth is some birds in the rain and the last is some foods in a pan. 100% unrelated photos. It should not have bundled these photos as one person.

Any way I can "delete" this person?

The OS that Immich Server is running on

Raspberry Pi OS 64 bit latest version

Version of Immich Server

v1.113.1

Version of Immich Mobile App

N/A

Platform with the issue

  • [X] Server
  • [X] Web
  • [X] Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /home/edu_adm/EGNAS2_shared_folder:/home/edu_adm/EGNAS2_shared_folder
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; >      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c", "shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The location where your database files are stored
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
# TZ=Etc/UTC

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
# Please use only the characters `A-Za-z0-9`, without special characters or spaces
DB_PASSWORD=<my_pass_redacted_to_post_here_on_github!>

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

I think it might be difficult to reproduce, but once it happens it can be observed by:

  1. Go to "explore"
  2. Click on "view all" of "People"
  3. Scroll to find the affected "Person" (which is a spurious collection of Photos) ...

Relevant log output

No response

Additional information

If you need help to debug this I am available to jump on discord to share the screen/do what is needed to help : )

eacunha avatar Sep 07 '24 19:09 eacunha

Have you changed any of your machine learning settings at all?

bo0tzz avatar Sep 07 '24 19:09 bo0tzz

no, at least not that I know of... how can this be changed?

eacunha avatar Sep 07 '24 19:09 eacunha

These are the settings there, I don't recall changing anything: image

eacunha avatar Sep 07 '24 19:09 eacunha

Those seem like just the defaults, so I really have no idea why it would have done this. @mertalev any ideas?

bo0tzz avatar Sep 07 '24 19:09 bo0tzz

That's super weird. What do the bounding boxes for these faces (or "faces") look like? You can open an image's info panel and hover over the person to see it.

mertalev avatar Sep 07 '24 20:09 mertalev

Did you change the thumbnail settings? Can you post it?

image

alextran1502 avatar Sep 07 '24 21:09 alextran1502

When I mouse over this "person", it detects the face of the genshin character: image

eacunha avatar Sep 07 '24 21:09 eacunha

.. or the "face" of the duck: image

eacunha avatar Sep 07 '24 21:09 eacunha

or a potato: image

eacunha avatar Sep 07 '24 21:09 eacunha

Did you change the thumbnail settings? Can you post it?

image

I think I have not changed that either, here it is: image

eacunha avatar Sep 07 '24 21:09 eacunha

Could you run a few SQL queries for me and share the output for each?

select * from pg_vector_index_stat;
select count(*) from asset_faces;
select count(*) from face_search;
with
  embeddings as (
    select "originalFileName", embedding
    from
      assets
        inner join asset_faces
          on assets.id = asset_faces."assetId"
        inner join face_search
          on asset_faces.id = face_search."faceId"
    where
        assets."originalFileName" in ('20220408091020.png', '20211030_115938.jpg', '20230306_123821.jpg')
  )
select this."originalFileName" image1, other."originalFileName" image2, this.embedding <=> other.embedding distance
from embeddings this, embeddings other;

mertalev avatar Sep 08 '24 01:09 mertalev

Could you run a few SQL queries for me and share the output for each?

select * from pg_vector_index_stat;
select count(*) from asset_faces;
select count(*) from face_search;
with
  embeddings as (
    select "originalFileName", embedding
    from
      assets
        inner join asset_faces
          on assets.id = asset_faces."assetId"
        inner join face_search
          on asset_faces.id = face_search."faceId"
    where
        assets."originalFileName" in ('20220408091020.png', '20211030_115938.jpg', '20230306_123821.jpg')
  )
select this."originalFileName" image1, other."originalFileName" image2, this.embedding <=> other.embedding distance
from embeddings this, embeddings other;

can you help me how/where exactly I can do that?

eacunha avatar Sep 08 '24 08:09 eacunha

You can run docker exec -it immich_postgres psql --dbname=immich --username=<DB_USERNAME> to connect to the database via the container directly, where <DB_USERNAME> is the value from your .env file. Then, you can just paste in a query and hit enter.

mertalev avatar Sep 08 '24 23:09 mertalev

I have the same issue unfortunately. At least it happens just a couple of times in the people section so i just hide the 'fake' person detected

SerAlbi avatar Sep 12 '24 13:09 SerAlbi

immich=# select * from pg_vector_index_stat;

tablerelid | indexrelid | tablename | indexname | idx_status | idx_indexing | idx_tuples | idx_sealed | idx_growing | idx_write | idx_size | idx_options
------------+------------+--------------+------------+------------+--------------+------------+------------+-------------+-----------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 17319 | 17331 | smart_search | clip_index | NORMAL | t | 130483 | {130413} | {70} | 0 | 285602760 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}} 17551 | 17575 | face_search | face_index | NORMAL | f | 116850 | {116752} | {} | 98 | 257665816 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}} (2 rows)

immich=# select count(*) from asset_faces; count

90480 (1 row)

immich=# select count(*) from face_search; count

90480 (1 row)

immich=# with embeddings as ( select "originalFileName", embedding from assets inner join asset_faces on assets.id = asset_faces."assetId" inner join face_search on asset_faces.id = face_search."faceId" where assets."originalFileName" in ('20220408091020.png', '20211030_115938.jpg', '20230306_123821.jpg') ) select this."originalFileName" image1, other."originalFileName" image2, this.embedding <=> other.embedding distance from embeddings this, embeddings other; image1 | image2 | distance ---------------------+---------------------+------------ 20211030_115938.jpg | 20211030_115938.jpg | 0 20211030_115938.jpg | 20220408091020.png | 0.7072995 20211030_115938.jpg | 20230306_123821.jpg | 0.68505836 20220408091020.png | 20211030_115938.jpg | 0.7072995 20220408091020.png | 20220408091020.png | 0 20220408091020.png | 20230306_123821.jpg | 0.7904459 20230306_123821.jpg | 20211030_115938.jpg | 0.68505836 20230306_123821.jpg | 20220408091020.png | 0.7904459 20230306_123821.jpg | 20230306_123821.jpg | 0 (9 rows)

eacunha avatar Oct 10 '24 22:10 eacunha

@mertalev could you check the result of the queries ^?

danieldietzler avatar Apr 01 '25 18:04 danieldietzler

I have the same issue, resulting in a single "person" with 800+ assets linked to it. I don't mind that it happens, no detection is flawless, but this feature request ( https://github.com/immich-app/immich/discussions/6559 ) would be nice so I can just delete it.

//edit: decided to create a simply Python script to do the cleanup.

PW999 avatar May 10 '25 10:05 PW999

This isn't really a bug as much as it is just a fundamental limitation of how machine learning works and the accuracy of the models that are available to us. The things we can do are make it easier to manage people, hide them, delete and recluster them, etc. These are feature requests and are tracked as GitHub Discussions.

jrasm91 avatar Jun 24 '25 20:06 jrasm91