sist2 icon indicating copy to clipboard operation
sist2 copied to clipboard

Scan fails on some PDFs - task doesn't termiante - kill command doesn't work

Open robertpfau opened this issue 11 months ago • 0 comments

Device Information (please complete the following information):

  • OS: Ubuntu 20.04
  • Deployment: Docker simon987/sist2:x64-linux
  • SIST2 Version: 3.2
  • Elasticsearch Version (if relevant) : Elasticsearch version 7.17.9

Command with arguments [ADMIN ] Starting sist2 command with args ['/root/sist2', 'scan', '/nfs/', '--threads=6', '--thumbnail-quality=50', '--thumbnail-count=1', '--thumbnail-size=552', '--content-size=32768', '--output=/sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2', '--depth=-1', '--archive=recurse', '--mem-buffer=2000', '--incremental', '--name=DBNAME', '--treemap-threshold=0.0005', '--json-logs', '--very-verbose']

Describe the bug Indexing for a number of pdfs fails with the following output and it hangs up the entire scan.

2023-08-01 11:45:57 [DEBUG /nfs/file1.PDF] Starting parse job {4c3bcce60239465bf75f8e6c349ca285}
 [ERROR ] corrupted size vs. prev_size while consolidating

2023-08-01 11:45:57 [DEBUG tpool.c] Child process terminated with status code 0
2023-08-01 11:45:57 [FATAL tpool.c] Child process crashed (Aborted).
                                         The process was working on /nfs/file1.PDF
                                         Please consider creating a bug report at https://github.com/simon987/sist2/issues !
                                         sist2 is an open source project and relies on the collaboration of its users to diagnose and fix bugs.
2023-08-01 11:45:57 [DEBUG database.c] Opening database /sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2 (0)
2023-08-01 11:45:57 [DEBUG database.c] Opening database /dev/shm/sist2-ipc-11.sqlite (1)
 [ERROR ] corrupted size vs. prev_size
2023-08-01 11:46:10 [DEBUG tpool.c] Child process terminated with status code 0
2023-08-01 11:46:10 [FATAL tpool.c] Child process crashed (Aborted).
                                         The process was working on /nfs/file2.PDF
                                         Please consider creating a bug report at https://github.com/simon987/sist2/issues !
                                         sist2 is an open source project and relies on the collaboration of its users to diagnose and fix bugs.
2023-08-01 11:46:10 [DEBUG database.c] Opening database /sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2 (0)
2023-08-01 11:46:10 [DEBUG database.c] Opening database /dev/shm/sist2-ipc-11.sqlite (1)
2023-08-01 11:46:16 [DEBUG tpool.c] Waiting for worker threads to finish

Expected behavior That these files get skipped and the scan proceeds with the rest (40k files)

Actual Behavior Tasks stops but is still shown as a running task - Kill does not terminate it. Only a docker container restart helps.

Additional context files sent via email.

robertpfau avatar Aug 02 '23 08:08 robertpfau