aleph icon indicating copy to clipboard operation
aleph copied to clipboard

BUG: Can't preview image and pdf also can't converte to PDF

Open PhamPham92 opened this issue 1 year ago • 13 comments

Hello,

I have this msg when i want to preview a image or a pdf : image image image

Can someone help me ? thank you.

PhamPham92 avatar Jul 18 '24 07:07 PhamPham92

Hi, thanks for using Aleph. Please make sure to provide the information specified in the issue template when opening issues in this repository.

Based on the error message, I think your ingest-file process might not use the correct database. How are you running Aleph? Did you specify a custom database URI (using the ALEPH_DATABASE_URI configuration option)?

tillprochaska avatar Jul 29 '24 09:07 tillprochaska

Hello, I use the docker-compose from the Deployment production methods and i didn't touch anything : image

PhamPham92 avatar Jul 31 '24 18:07 PhamPham92

Which Aleph version are you using?

tillprochaska avatar Aug 06 '24 11:08 tillprochaska

I use the 3.17.0 : image

PhamPham92 avatar Aug 06 '24 16:08 PhamPham92

@PhamPham92 Hey! I also faced the same issue with ingest_cache. It occurred after switching to 3.17. Have you tried running aleph upgrade? It helped me.

UPD: Frankly, it helped only for some particular case/document! I reproduced this bug with other documents, unfortunately...

Amerousful avatar Sep 03 '24 08:09 Amerousful

Hello, thanks for your answer. I have already tried, but it didn't help.

PhamPham92 avatar Sep 06 '24 17:09 PhamPham92

I have the same issue while using 3.17.0 Mainly "OperationalError('(sqlite3.OperationalError) no such table: ingest_cache')" and a few "Could not extract PDF file: RuntimeError('Set changed size during iteration')" and Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:4e272afede8878a8d943ac3d854b97d768613274',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)

In docker-compose.yml I tried to comment out "~:/host" and in aleph.tmpl I tried to comment in ARCHIVE_TYPE=file and ARCHIVE_PATH=/data

With a dataset of 20k files (2.2 GB), files with text are well indexed, small images often, larger images and pdfs almost never.

When I tried to crawl 2 .pngs and 2 .pdfs, 1 pdf was successfully indexed.

With a dataset of 6k files (860.9 MB) the success rate for pdfs was about 50 %, but it still failed with larger images.

gethert avatar Sep 11 '24 17:09 gethert

Hello,

Exactly the same issue for me!

PhamPham92 avatar Sep 11 '24 18:09 PhamPham92

One of my instances is also getting this bug:

No preview is available for this document Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:[redacted-string]:deu',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)

I am using Aleph version 3.17.0 and ingest-file version 3.22.0. This bug did not occur with previous versions (that I know of).

~~Even though ALEPH_DATABASE_URI is commented out in the aleph.env, there is a postgres deployed via docker-compose.yml, in which ingest-file depends on postgres.~~ [Edit: Bad thinking at the end of the day…]

Please let me know, if I can assist finding the bug's fault with my setup.

riotbib avatar Sep 17 '24 17:09 riotbib

could you all enter a shell within a running ingest-file container and print the result of echo $FTM_STORE_URI (should point to the default postgres uri if you didn't touch anything and using the official docker builds)

simonwoerpel avatar Sep 17 '24 17:09 simonwoerpel

$ docker-compose run --rm ingest-file bash                                                                                                                                              (main✱)
WARN[0000] /opt/aleph/docker-compose.yml: `version` is obsolete
[+] Creating 2/0
 ✔ Container aleph-redis-1     Running                                                                                                                                                                          0.0s
 ✔ Container aleph-postgres-1  Running                                                                                                                                                                          0.0s
root@ingest:/ingestors# echo $FTM_STORE_URI
postgresql://aleph:aleph@postgres/aleph
root@ingest:/ingestors# echo $ALEPH_DATABASE_URI


Thanks, @simonwoerpel. I also echoed ALEPH_DATABASE_URI which is empty. Maybe that's why it falls back to SQLite?

riotbib avatar Sep 17 '24 17:09 riotbib

No, ingest-file doesn't even know about aleph in that sense, it only knows about the ftm store :)

simonwoerpel avatar Sep 17 '24 18:09 simonwoerpel

image

PhamPham92 avatar Sep 17 '24 18:09 PhamPham92

It looks like the servicelayer (from ingest-file) uses the TAGS_DATABASE_URI environment variable to set the location of the tags database. By default, the variable is not set.

I was able to fix the issue by adding

TAGS_DATABASE_URI=sqlite:///data/tags.sqlite

to aleph.env (OK, adding it to the environment of ingest-file would probably be enough. Oh well…) and restarting ingest-file: docker compose up -d ingest-file --force-recreate

I then had to reingest all documents, but at least it works now.

sjinks avatar Nov 17 '24 22:11 sjinks

I explained how to fix this in https://github.com/alephdata/aleph/issues/4002#issuecomment-2612973386. Proper documentation fix in https://github.com/alephdata/aleph/pull/4108.

stchris avatar Jan 24 '25 16:01 stchris