sist2 icon indicating copy to clipboard operation
sist2 copied to clipboard

Getting error 'PDF stream Length incorrect' while trying to index pdf file

Open mviejo33 opened this issue 10 months ago • 2 comments

Device Information (please complete the following information): Docker

services:
  elasticsearch:
    image: elasticsearch:7.17.9
    restart: unless-stopped
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - "ingest.geoip.downloader.enabled=false"
  sist2-admin:
    image: simon987/sist2:3.1.4-x64-linux
    restart: unless-stopped
    volumes:
      - ./sist2-admin-data/:/sist2-admin/
      - /:/host
    ports:
      - 4090:4090 # sist2
      - 8080:8080 # sist2-admin
    working_dir: /root/sist2-admin/
    entrypoint: python3 /root/sist2-admin/sist2_admin/app.py

Command with arguments

Describe the bug Getting error 'PDF stream Length incorrect' while trying to index pdf file

Steps To Reproduce Please be specific!

  1. Run indexing with PDF included, spanish and english

Screenshots

Additional context actual related logs:

2023-08-23 23:50:58 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] Starting parse job {c20e9c898c6fb55a03a132ef684b2a69}
2023-08-23 23:51:00 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: PDF stream Length incorrect
2023-08-23 23:51:03 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: ... repeated 2 times...
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: OCR Disabled in this build
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: aborting process from uncaught error!

mviejo33 avatar Aug 29 '23 20:08 mviejo33