BUG: Can't preview image and pdf also can't converte to PDF
Hello,
I have this msg when i want to preview a image or a pdf :
Can someone help me ? thank you.
Hi, thanks for using Aleph. Please make sure to provide the information specified in the issue template when opening issues in this repository.
Based on the error message, I think your ingest-file process might not use the correct database. How are you running Aleph? Did you specify a custom database URI (using the ALEPH_DATABASE_URI configuration option)?
Hello,
I use the docker-compose from the Deployment production methods and i didn't touch anything :
Which Aleph version are you using?
I use the 3.17.0 :
@PhamPham92 Hey! I also faced the same issue with ingest_cache. It occurred after switching to 3.17. Have you tried running aleph upgrade? It helped me.
UPD: Frankly, it helped only for some particular case/document! I reproduced this bug with other documents, unfortunately...
Hello, thanks for your answer. I have already tried, but it didn't help.
I have the same issue while using 3.17.0 Mainly "OperationalError('(sqlite3.OperationalError) no such table: ingest_cache')" and a few "Could not extract PDF file: RuntimeError('Set changed size during iteration')" and Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:4e272afede8878a8d943ac3d854b97d768613274',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)
In docker-compose.yml I tried to comment out "~:/host" and in aleph.tmpl I tried to comment in ARCHIVE_TYPE=file and ARCHIVE_PATH=/data
With a dataset of 20k files (2.2 GB), files with text are well indexed, small images often, larger images and pdfs almost never.
When I tried to crawl 2 .pngs and 2 .pdfs, 1 pdf was successfully indexed.
With a dataset of 6k files (860.9 MB) the success rate for pdfs was about 50 %, but it still failed with larger images.
Hello,
Exactly the same issue for me!
One of my instances is also getting this bug:
No preview is available for this document Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:[redacted-string]:deu',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)
I am using Aleph version 3.17.0 and ingest-file version 3.22.0. This bug did not occur with previous versions (that I know of).
~~Even though ALEPH_DATABASE_URI is commented out in the aleph.env, there is a postgres deployed via docker-compose.yml, in which ingest-file depends on postgres.~~ [Edit: Bad thinking at the end of the day…]
Please let me know, if I can assist finding the bug's fault with my setup.
could you all enter a shell within a running ingest-file container and print the result of echo $FTM_STORE_URI (should point to the default postgres uri if you didn't touch anything and using the official docker builds)
$ docker-compose run --rm ingest-file bash (main✱)
WARN[0000] /opt/aleph/docker-compose.yml: `version` is obsolete
[+] Creating 2/0
✔ Container aleph-redis-1 Running 0.0s
✔ Container aleph-postgres-1 Running 0.0s
root@ingest:/ingestors# echo $FTM_STORE_URI
postgresql://aleph:aleph@postgres/aleph
root@ingest:/ingestors# echo $ALEPH_DATABASE_URI
Thanks, @simonwoerpel. I also echoed ALEPH_DATABASE_URI which is empty. Maybe that's why it falls back to SQLite?
No, ingest-file doesn't even know about aleph in that sense, it only knows about the ftm store :)
It looks like the servicelayer (from ingest-file) uses the TAGS_DATABASE_URI environment variable to set the location of the tags database. By default, the variable is not set.
I was able to fix the issue by adding
TAGS_DATABASE_URI=sqlite:///data/tags.sqlite
to aleph.env (OK, adding it to the environment of ingest-file would probably be enough. Oh well…) and restarting ingest-file: docker compose up -d ingest-file --force-recreate
I then had to reingest all documents, but at least it works now.
I explained how to fix this in https://github.com/alephdata/aleph/issues/4002#issuecomment-2612973386. Proper documentation fix in https://github.com/alephdata/aleph/pull/4108.