os error 22 when indexing
On MacOS 15.2, running postgres 16 from home-brew, and building lantern from master, the following inscrutable errors are thrown when using an external index:
[+] [Lantern External Index] New connection: 127.0.0.1:63501
[*] [Lantern External Index] Number of available CPU cores: 20
[*] [Lantern External Index] Index Params - pq: false, metric_kind: Cos, quantization: I8, dim: 768, m: 34, ef_construction: 256, ef: 256, num_subvectors: 0, num_centroids: 0, element_bits: 32
[*] [Lantern External Index] Creating index with parameters dimensions=768 m=34 ef=256 ef_construction=256, hardware_acceleration=serial
[*] [Lantern External Index] Estimated capcity is 38979200
[+] [Lantern External Index] Indexed 3897920 tuples [speed 1110 tuples/s]...
[+] [Lantern External Index] Indexing took 6948s, indexed 7764322 items
[*] [Lantern External Index] Start streaming index
[+] [Lantern External Index] Writing index to file took 12s722ms
[+] [Lantern External Index] Reading index file took 1s662ms
[X] [Lantern External Index] Indexing error: Invalid argument (os error 22)
The corresponding command on the postgres side:
create index on embeddings_denormalized using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, quant_bits = 8, external=true) where abs(ulid_hash(page_id) % 100) < 20;
INFO: done init usearch index
INFO: connecting to external indexing server on 127.0.0.1:8998
INFO: successfully connected to external indexing server
ERROR: external index error: Invalid argument (os error 22)
Time: 6963287.590 ms (01:56:03.288)
The partial index (where) has no impact, I simply put this in because the table is large and debugging this problem is a PITA.
Lantern was invoked using:
lantern-cli start-indexing-server --tmp-dir /opt/homebrew/var/
Thinking that this might be gatekeeper related, I tried changing tmp-dir but it had no impact.
Hi @mmisiewicz , thanks for reporting the issue. Can you try the following cases and see which one will make it, so we can try to understand from where the issue is coming.
- Indexing on less data (e.g 10k items) with the same parameters. You can create a table from your original table with 10k items
CREATE TABLE embeddings_test AS SELECT * FROM embeddings_denormalized LIMIT 10000;and then run the indexing onembeddings_testtable. - If the above will fail again, try indexing without scalar quantization on the
embeddings_testtable:create index on embeddings_denormalized using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, external=true);
Also can you share the data type of vec column? If it is REAL[] you can avoid the cast and use dist_cos_ops directly
Hey @var77 - vec is a halfvec from the latest pgvector.
Running the indexing on a subset of 10,000, 100,000 and at least 1MM one time worked OK. Beyond that OS Error 22 when running the indexing server on my M1 Ultra.
I observed a similar seeming error running the index request against an indexing server on x86 Linux, OS Error 11, resource temporarily unavailable. Could that be a clue?
Is there a way to increase verbosity to find out where these errors are coming from?
the issue also occurs when running lantern as a postgres background worker.
(39242) [local]:5432 mike@mike=# create index on bigtable using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, quant_bits = 8, external=true);
INFO: done init usearch index
INFO: connecting to external indexing server on 127.0.0.1:8998
INFO: successfully connected to external indexing server
ERROR: external index error: Invalid argument (os error 22)
Time: 33025415.155 ms (09:10:25.415)
One more clue... I am able to reproduce the issue using the autotune tool on a table with the embedding stored in a float4[] column, which rules out the index parameters and halfvec being issues I think.
➜ lantern-cli autotune-index --uri 'postgresql://localhost/mike' --table "test_emb_tune" --column "vec" --metric-kind cos --recall 99 --test-data-size 100000 --k 500
[+] [Lantern Index Autotune] Progress 5%
[+] [Lantern Index Autotune] Progress 15%
[+] [Lantern Index Autotune] Progress 25%
[+] [Lantern Index Autotune] Progress 35%
[+] [Lantern Index Autotune] Progress 45%
[+] [Lantern Index Autotune] Progress 55%
[+] [Lantern Index Autotune] Progress 65%
[+] [Lantern Index Autotune] Progress 70%
[*] [Lantern Index Autotune] ========== Results for job 36978 ==========
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.9ms, indexing_duration=5s) index_params(m=6, ef=64, efc=32)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=197.3ms, indexing_duration=5s) index_params(m=8, ef=64, efc=40)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.7ms, indexing_duration=6s) index_params(m=12, ef=64, efc=48)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.8ms, indexing_duration=8s) index_params(m=16, ef=76, efc=60)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=193.2ms, indexing_duration=18s) index_params(m=32, ef=96, efc=96)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=188.1ms, indexing_duration=36s) index_params(m=48, ef=128, efc=128)
[+] [Lantern Index Autotune] Progress 100%
➜ lantern-cli autotune-index --uri 'postgresql://localhost/mike' --table "test_emb_tune" --column "vec" --metric-kind cos --recall 99 --test-data-size 1000000 --k 10
[+] [Lantern Index Autotune] Progress 5%
[X] [Lantern Index Autotune] db error: ERROR: external index error: Invalid argument (os error 22)
Note that when the table contains 1MM rows, the first create index command tested by the autotuner failed (m = 6).
Test table has a very minimal schema of
(39242) [local]:5432 mike@mike=# \d test_emb_tune
Table "test_emb_tune"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
page_id | character varying(26) | | |
idx | integer | | |
vec | real[] | | |
Thanks for sharing the details, can you try one more thing as well:
- Generate SSL certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/lantern-key.pem -out /tmp/lantern-cert.pem -subj '/C=US/ST=California/L=San Francisco/O=Lantern/CN=lantern.dev' - Run the indexing server using that certificate
lantern-cli start-indexing-server --cert /tmp/lantern-cert.pem --key /tmp/lantern-key.pem - Set
lantern_extras.external_index_secure=trueand retry indexing
Meanwhile I will try to think of a way to get more verbose output.