sist2
sist2 copied to clipboard
Getting error 'PDF stream Length incorrect' while trying to index pdf file
Device Information (please complete the following information): Docker
services:
elasticsearch:
image: elasticsearch:7.17.9
restart: unless-stopped
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
- "ingest.geoip.downloader.enabled=false"
sist2-admin:
image: simon987/sist2:3.1.4-x64-linux
restart: unless-stopped
volumes:
- ./sist2-admin-data/:/sist2-admin/
- /:/host
ports:
- 4090:4090 # sist2
- 8080:8080 # sist2-admin
working_dir: /root/sist2-admin/
entrypoint: python3 /root/sist2-admin/sist2_admin/app.py
Command with arguments
Describe the bug Getting error 'PDF stream Length incorrect' while trying to index pdf file
Steps To Reproduce Please be specific!
- Run indexing with PDF included, spanish and english
Screenshots
Additional context actual related logs:
2023-08-23 23:50:58 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] Starting parse job {c20e9c898c6fb55a03a132ef684b2a69}
2023-08-23 23:51:00 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: PDF stream Length incorrect
2023-08-23 23:51:03 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: ... repeated 2 times...
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: OCR Disabled in this build
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: aborting process from uncaught error!