Possible behavior mismatch: Immich resets vector length in smart_search database when model is changed via UI, but does not perform this check if model is specified via config file
The bug
When updating the CLIP model, Immich Server is expected to update the database table "smart_search" to set the data type of the "embedding" column to a vector width which matches the CLIP model selected.
I believe that currently this behavior is only being performed when that change is made IN THE WEB UI, and is not performed at system startup or during the config-file parsing, even if the CLIP model was changed.
It is possible that this behavioral issue is restricted to environments where there is no persistence (e.g. deployed as a pod in Kubernetes without an underlying persistent volume), and does not appear if the immich-server has a stable data source beneath it. I believe that performing this check on startup when using a config file would still be a more robust behavior.
The OS that Immich Server is running on
k8s on Ubuntu 22.04
Version of Immich Server
v1.103.1
Version of Immich Mobile App
N/A
Platform with the issue
- [X] Server
- [ ] Web
- [ ] Mobile
Your docker-compose.yml content
Deployed from k8s yaml files
Your .env content
Deployed from k8s yaml files
Reproduction steps
1. Deploy immich, using a configuration file pointed to by the "IMMICH_CONFIG_FILE" environment variable
2. Change the "ModelName" configuration parameter in the config file to one which uses a different vector width (e.g. 1024 instead of the default 512)
3. Run a "Smart Search" job on All assets
Relevant log output
immich-microservices-67fbc95c97-9nzg7 microservices [Nest] 1 - 05/11/2024, 1:26:32 PM ERROR [JobService] Unable to run job handler (smartSearch/smart-search): QueryFailedError: pgvecto.rs: The given vector is invalid for input.
immich-microservices-67fbc95c97-9nzg7 microservices ADVICE: Check if dimensions and scalar type of the vector is matched with the index.
immich-microservices-67fbc95c97-9nzg7 microservices [Nest] 1 - 05/11/2024, 1:26:32 PM ERROR [JobService] QueryFailedError: pgvecto.rs: The given vector is invalid for input.
immich-microservices-67fbc95c97-9nzg7 microservices ADVICE: Check if dimensions and scalar type of the vector is matched with the index.
immich-microservices-67fbc95c97-9nzg7 microservices at PostgresQueryRunner.query (/usr/src/app/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:219:19)
immich-microservices-67fbc95c97-9nzg7 microservices at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
immich-microservices-67fbc95c97-9nzg7 microservices at async InsertQueryBuilder.execute (/usr/src/app/node_modules/typeorm/query-builder/InsertQueryBuilder.js:106:33)
immich-microservices-67fbc95c97-9nzg7 microservices at async SearchRepository.upsert (/usr/src/app/dist/repositories/search.repository.js:188:9)
immich-microservices-67fbc95c97-9nzg7 microservices at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:90:9)
immich-microservices-67fbc95c97-9nzg7 microservices at async /usr/src/app/dist/services/job.service.js:149:36
immich-microservices-67fbc95c97-9nzg7 microservices at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
immich-microservices-67fbc95c97-9nzg7 microservices at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
immich-microservices-67fbc95c97-9nzg7 microservices [Nest] 1 - 05/11/2024, 1:26:32 PM ERROR [JobService] Object:
immich-microservices-67fbc95c97-9nzg7 microservices {
immich-microservices-67fbc95c97-9nzg7 microservices "id": "303f6f4b-da13-49fb-8b14-d9effc7a8506"
immich-microservices-67fbc95c97-9nzg7 microservices }
Additional information
Background
I stood up a new immich instance using Kubernetes.
In doing so, I initially brought the server pod up without a config file specified.
I changed several config variables and exported the JSON. While doing this, I did NOT change the "Clip Model" under Machine Learning Settings.
I added the exported JSON to a K8s ConfigMap and updated the Pod to mount that ConfigMap as a directory.
I updated the pod environment (for Server and Microservices) to include the "IMMICH_CONFIG_FILE" variable pointing to the mounted config file, and restarted the pods so they would use that config file. At this point the server came up and behaved normally, and exhibited no errors in the logs of any component.
I verified that the Server was indeed using the config file, as the "Settings" page of Administration stated that and locked any changes from being made.
I then edited the config file definition to change the Clip model to "ViT-g-14__laion2b-s12b-b42k".
I applied this change to the k8s configmap and restarted the server instance. it came up correctly, and showed the new model name in the "Settings" of the web administration tool. No errors or unusual messages were present in the log at this time.
Error detected
I then ran the "smart search" job for All assets (since I had changed the clip model). The "waiting" count did not go down, and error logs from the microservices pod indicated "the given vector is invalid for input" (see logs section).
Research
I found this discussion indicating a similar issue: https://github.com/immich-app/immich/discussions/7425
The error seems to be directly related to a mismatch between the vector size of the "embedding" column in the smart_search table of the database vs. the vector size the CLIP model requires.
Resolving the issue requires making a change to the data type of the column (setting the vector size correctly), and rebuilding some indexes on the table.
A commenter in that discussion stated that the immich server should be performing these database changes when it detects that the size of the model has changed, and included two server log lines indicating such:
immich_server | [Nest] 7 - 04/27/2024, 4:47:02 PM LOG [SearchRepository] Dimension size of model ViT-L-14-336__openai is 768, but database expects 512.
immich_server | [Nest] 7 - 04/27/2024, 4:47:02 PM LOG [SearchRepository] Updating database CLIP dimension size to 768.
These lines never appeared in my server logs.
I then disabled the IMMICH_CONFIG_FILE environment variable and restarted the immich server pod. This reset the settings to defaults, including the CLIP model.
I then changed the model IN THE ADMIN WEB UI to the desired one and clicked save. The server immediately performed the expected database update.
I re-enabled the IMMICH_CONFIG_FILE environment variable and restarted the pod again. I was then able to perform a smart-search reindex using the desired CLIP model. successfully.
It's possible that this behavior/issue is limited to cases where a config file is being mounted into a pod via a Kubernetes ConfigMap, or cases where there is no persistence for immich-server except the config file.
That sounds correct, and would explain some issues we've seen. @mertalev thoughts?
It does (or should I guess) check if it should update the smart search table at startup. Specifically, it checks if the dimensions of the current model match the dimensions in the table and updates the table if not. It's possible there's a bug somewhere in that logic, though.
Apologies if I'm completely off on the wrong path, I don't know typescript at all or Javascript basically at all, but in the codebase I'm seeing the following control flow:
(inside the server section of the repo, and converting from the JS files on my drive to the TS files in the repo)
start.sh runs node on main -> src/main.js runs nestFactory.create() on apiModule -> src/app.module.ts constructor of ApiModule calls the init() of an ApiService -> src/services/api.service.ts calls init() on a databaseService -> src/services/database.service.ts
At this point, the following seems to occur:
- if the postgres version is below the minimum major version, the code throws an error (14 is hardcoded as the minimum earlier in the file)
- Attempts to lock the database with a lock of 'Databaselock.Migrations`
- if this fails for some reason, nothing happens
- If the lock succeeds:
- Runs
createVectorExtension()->src/repositories/database.repository.tsruns a query ofCREATE EXTENSION IF NOT EXISTS ${extension}where ${extension} is the preferred extension of vectors and vector. if this fails, an error is thrown - Runs
updateVectorExtension()->src/repositories/database.repository.ts- Compares the vector extension version to the expected one
- If the version is outdated and the extension is vectors
- Runs
updateVectorSchemaif the version is outdated
- Runs
- Runs a query setting the version of the extension to the expected one
- if the version was not outdated, at this point it returns (note: it does not return an explicit value and I do not know what the default is in typescript?)
- if the version WAS outdated
- if the extension is vectors, run
SELECT pgvectors_upgrade()and set "restartRequired" to TRUE - if the extension IS NOT vectors, run a reindex of FACE and CLIP indexes
- At this point in the codepath, it seems to be updating the 'embeddings' column with a new dimSize
- to get here, the extension would have to be "vector" (not "vectors") and an update had to happen
- if the extension is vectors, run
- Runs
assertVectorExtension()which just throws an error if the extension version is null, presumably as a final check that the above at least ~somewhat happened correctly.
- Runs
Taking a brief break from bullet points to refocus: at this point we're at Line 36 of the database.service.ts file. At this point, the following happens:
-
shouldReindexis checked for each of FACE and CLIP ->src/repositories/database.repository.ts- if the extension isn't "vectors" it returns false immediately
- if the extension IS "vectors", it runs
SELECT idx_status FROM pg_vector_index_stat WHERE indexnamefor either FACE or CLIP- If the idx_status is "UPGRADE", it returns TRUE
- if the idx_status is anything else, it returns FALSE
- on error, if the response includes the words 'index is not existing', it returns TRUE
- on error if the response includes the words 'relation "pg_vector_index_stat" does not exist', it returns FALSE
- otherwise it throws the error
- NOTE: this query returns
NORMALin my database for both items, so it would return FALSE
I am probably missing some sort of invalid state detail here. Maybe it's impossible to have it get to the end without it running this reindex at least once? But as I'm looking at it, it only gets to the point of running the reindex code (which is the code that does the dimSize column update) when an update to the vector extension was needed, which presumably never occurred here because this was a new install of an (idempotently-deployed) current version of the pgvecto.rs tensorchord container image.
The check happens in the search repository's init, referenced in the smart info service.
Oh, I guess the init in the smart info service isn't getting called anywhere haha.
Ah, so it’s just not calling that unless you change the model while running?
Yup, so it only updates on a config change event in practice. Making it run at startup should fix it.
This has been fixed.