paperless-ng icon indicating copy to clipboard operation
paperless-ng copied to clipboard

[BUG] Log shows many "File not found" errors, although everything appears to work

Open ziggystar opened this issue 2 years ago • 7 comments

Describe the bug I'm running paperless-ng on Raspberry Pi 3B.

To Reproduce

  1. add a file to the consumption directory
  2. the file is imported, but errors appear in the log

Expected behavior File is imported and no errors appear.

Webserver logs

[2021-10-30 14:12:00,999] [DEBUG] [paperless.classifier] Vectorizing data...
[2021-10-30 14:12:02,626] [DEBUG] [paperless.classifier] There are no tags. Not training tags classifier.
[2021-10-30 14:12:02,627] [DEBUG] [paperless.classifier] Training correspondent classifier...
[2021-10-30 14:12:32,110] [DEBUG] [paperless.classifier] Training document type classifier...
[2021-10-30 14:13:01,267] [INFO] [paperless.tasks] Saving updated classifier model to /usr/src/paperless/src/../data/classification_model.pickle...
[2021-10-30 14:55:22,864] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/doc00170820211030165624.pdf to the task queue.
[2021-10-30 14:55:23,838] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/doc00170820211030165624.pdf: File not found.
[2021-10-30 14:55:25,236] [WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/temp_scan_data_3d8045_6c366d51: Unknown file extension.
[2021-10-30 14:55:54,780] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/doc00170920211030165650.pdf to the task queue.
[2021-10-30 14:55:55,214] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/doc00170920211030165650.pdf: File not found.
[2021-10-30 14:56:02,727] [WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/temp_scan_data_3d8045_530e0f2b: Unknown file extension.
[2021-10-30 14:56:02,737] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/doc00170920211030165650.pdf to the task queue.
[2021-10-30 14:56:03,228] [INFO] [paperless.consumer] Consuming doc00170920211030165650.pdf
[2021-10-30 14:56:03,286] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2021-10-30 14:56:03,606] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2021-10-30 14:56:03,623] [DEBUG] [paperless.consumer] Parsing doc00170920211030165650.pdf...
[2021-10-30 14:56:04,224] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /usr/src/paperless/src/../consume/doc00170920211030165650.pdf
[2021-10-30 14:56:06,241] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/usr/src/paperless/src/../consume/doc00170920211030165650.pdf', 'output_file': '/tmp/paperless/paperless-hs0w7a0_/archive.pdf', 'use_threads': True, 'jobs': '1', 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-hs0w7a0_/sidecar.txt'}
[2021-10-30 14:59:25,678] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file
[2021-10-30 14:59:25,685] [DEBUG] [paperless.consumer] Generating thumbnail for doc00170920211030165650.pdf...
[2021-10-30 14:59:25,707] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /tmp/paperless/paperless-hs0w7a0_/archive.pdf[0] /tmp/paperless/paperless-hs0w7a0_/convert.png
[2021-10-30 14:59:33,125] [DEBUG] [paperless.parsing.tesseract] Execute: optipng -silent -o5 /tmp/paperless/paperless-hs0w7a0_/convert.png -out /tmp/paperless/paperless-hs0w7a0_/thumb_optipng.png

Relevant information

  • running DietPi (Debian-based) on RPi 3B
  • Version 1.5.0
  • Installation method: docker
  • Any configuration changes you made in docker-compose.yml, docker-compose.env or paperless.conf.

I set redis version to 6.2.4 while trying to work around the RPi issues:

docker-composy.yml

version: "3.4"
services:
  broker:
    image: redis:6.2.4
    restart: unless-stopped

  webserver:
    image: jonaswinkler/paperless-ng:latest
    restart: unless-stopped
    depends_on:
      - broker
    ports:
      - 8000:8000
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379


volumes:
  data:
  media:

docker-compose.env

# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
USERMAP_UID=1000
USERMAP_GID=1000

# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
PAPERLESS_OCR_LANGUAGES=deu

###############################################################################
# Paperless-specific settings                                                 #
###############################################################################

# All settings defined in the paperless.conf.example can be used here. The
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.

# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me

# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE=America/Los_Angeles

# The default language to use for OCR. Set this to the language most of your
# documents are written in.
PAPERLESS_OCR_LANGUAGE=deu

PAPERLESS_ADMIN_USER=<snip>
PAPERLESS_ADMIN_PASSWORD=<snip>

PAPERLESS_TASK_WORKERS=1

PAPERLESS_THREADS_PER_WORKER=1

PAPERLESS_WEBSERVER_WORKERS=1

PAPERLESS_IGNORE_DATES="<snip>"

ziggystar avatar Oct 31 '21 08:10 ziggystar

I have the same issue:

[2021-11-04 14:46:21,794] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/SCN_0001.pdf to the task queue.

[2021-11-04 14:46:22,008] [INFO] [paperless.consumer] Consuming SCN_0001.pdf

[2021-11-04 14:46:22,011] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2021-11-04 14:46:22,040] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2021-11-04 14:46:22,058] [DEBUG] [paperless.consumer] Parsing SCN_0001.pdf...

[2021-11-04 14:46:22,202] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /usr/src/paperless/src/../consume/SCN_0001.pdf

[2021-11-04 14:46:22,461] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/usr/src/paperless/src/../consume/SCN_0001.pdf', 'output_file': '/tmp/paperless/paperless-b92gcc6h/archive.pdf', 'use_threads': True, 'jobs': 1, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-b92gcc6h/sidecar.txt'}

[2021-11-04 14:46:24,863] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/SCN_0001.pdf to the task queue.

[2021-11-04 14:46:25,165] [INFO] [paperless.consumer] Consuming SCN_0001.pdf

[2021-11-04 14:46:25,168] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2021-11-04 14:46:25,186] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2021-11-04 14:46:25,198] [DEBUG] [paperless.consumer] Parsing SCN_0001.pdf...

[2021-11-04 14:46:25,393] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /usr/src/paperless/src/../consume/SCN_0001.pdf

[2021-11-04 14:46:25,658] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/usr/src/paperless/src/../consume/SCN_0001.pdf', 'output_file': '/tmp/paperless/paperless-orcpu49h/archive.pdf', 'use_threads': True, 'jobs': 1, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-orcpu49h/sidecar.txt'}

[2021-11-04 14:46:28,186] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/SCN_0001.pdf to the task queue.

[2021-11-04 14:46:34,216] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/SCN_0001.pdf to the task queue.

[2021-11-04 14:46:44,977] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file

[2021-11-04 14:46:44,979] [DEBUG] [paperless.consumer] Generating thumbnail for SCN_0001.pdf...

[2021-11-04 14:46:44,994] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /tmp/paperless/paperless-b92gcc6h/archive.pdf[0] /tmp/paperless/paperless-b92gcc6h/convert.png

[2021-11-04 14:46:47,469] [DEBUG] [paperless.parsing.tesseract] Execute: optipng -silent -o5 /tmp/paperless/paperless-b92gcc6h/convert.png -out /tmp/paperless/paperless-b92gcc6h/thumb_optipng.png

[2021-11-04 14:46:53,265] [DEBUG] [paperless.consumer] Saving record to database


[2021-11-04 14:46:53,539] [DEBUG] [paperless.consumer] Deleting file /usr/src/paperless/src/../consume/SCN_0001.pdf

[2021-11-04 14:46:53,615] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-b92gcc6h


[2021-11-04 14:46:54,271] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/SCN_0001.pdf: File not found.

[2021-11-04 14:46:54,789] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/SCN_0001.pdf: File not found.

[2021-11-04 14:46:58,490] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-orcpu49h

[2021-11-04 14:46:58,503] [ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.nn4bob6z/origin.pdf'

Traceback (most recent call last):

  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 241, in parse

    ocrmypdf.ocr(**args)

  File "/usr/local/lib/python3.9/site-packages/ocrmypdf/api.py", line 340, in ocr

    return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)

  File "/usr/local/lib/python3.9/site-packages/ocrmypdf/_sync.py", line 374, in run_pipeline

    exec_concurrent(context, executor)

  File "/usr/local/lib/python3.9/site-packages/ocrmypdf/_sync.py", line 298, in exec_concurrent

    pdf = post_process(pdf, context, executor)

  File "/usr/local/lib/python3.9/site-packages/ocrmypdf/_sync.py", line 232, in post_process

    pdf_out = metadata_fixup(pdf_out, context)

  File "/usr/local/lib/python3.9/site-packages/ocrmypdf/_pipeline.py", line 792, in metadata_fixup

    with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:

  File "/usr/local/lib/python3.9/site-packages/pikepdf/_methods.py", line 948, in open

    pdf = Pdf._open(

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.nn4bob6z/origin.pdf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/usr/src/paperless/src/documents/consumer.py", line 248, in try_consume_file

    document_parser.parse(self.path, mime_type, self.filename)

  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 290, in parse

    raise ParseError(f"{e.__class__.__name__}: {str(e)}")

documents.parsers.ParseError: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.nn4bob6z/origin.pdf'

Don't know if this is related, but for these document I have only the wrong numbers of pages in the "archive" folder. The archive version only contains a single page. The original document, available via download, contains all 4 pages.

AceTheFace avatar Nov 04 '21 14:11 AceTheFace

I'm seeing similar issues, deploying with postgres and tiki via the docker compose @ https://github.com/jonaswinkler/paperless-ng/tree/master/docker/compose

When I add PDF's to the consume dir, the consumer appears to create anywhere from 3-5 tasks for the same filename.

One task runs and completes successfully, then deletes the file. The subsequent tasks fail because they are trying to process the same (now missing) filename. These usually fail with a 'file not found' error, but very occasionally I get a duplicate key conflict at db level instead

[2021-11-23 17:11:38,837] [INFO] [paperless.management.consumer] Adding /data/consume/20210118 - Github.pdf to the task queue.
[2021-11-23 17:11:38,841] [INFO] [paperless.management.consumer] Adding /data/consume/20210118 - Github.pdf to the task queue.
[2021-11-23 17:11:45,217] [INFO] [paperless.consumer] Consuming 20210118 - Github.pdf
[2021-11-23 17:11:45,290] [INFO] [paperless.management.consumer] Adding /data/consume/20210118 - Github.pdf to the task queue.
[2021-11-23 17:11:45,292] [INFO] [paperless.management.consumer] Adding /data/consume/20210118 - Github.pdf to the task queue.
[2021-11-23 17:11:45,296] [INFO] [paperless.management.consumer] Adding /data/consume/20210118 - Github.pdf to the task queue.

[2021-11-23 17:11:45,728] [INFO] [paperless.consumer] Consuming 20210118 - Github.pdf
[2021-11-23 17:11:45,729] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2021-11-23 17:11:45,849] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2021-11-23 17:11:45,853] [DEBUG] [paperless.consumer] Parsing 20210118 - Github.pdf...
[2021-11-23 17:11:46,095] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /data/consume/20210118 - Github.pdf
[2021-11-23 17:11:46,746] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/data/consume/20210118 - Github.pdf', 'output_file': '/tmp/paperless/paperless-kt7yrsrx/archive.pdf', 'use_threads': True, 'jobs': 4, 'language': 'eng', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-kt7yrsrx/sidecar.txt'}
[2021-11-23 17:11:48,501] [DEBUG] [paperless.consumer] Generating thumbnail for 20210118 - Github.pdf...
[2021-11-23 17:11:48,507] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /tmp/paperless/paperless-kt7yrsrx/archive.pdf[0] /tmp/paperless/paperless-kt7yrsrx/convert.png
[2021-11-23 17:11:57,799] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2021-11-23 17:11:57,804] [DEBUG] [paperless.consumer] Saving record to database
[2021-11-23 17:11:57,847] [DEBUG] [paperless.consumer] Deleting file /data/consume/20210118 - Github.pdf
[2021-11-23 17:11:57,954] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-kt7yrsrx
[2021-11-23 17:11:57,955] [INFO] [paperless.consumer] Document 2021-01-18 20210118 - Github consumption finished

[2021-11-23 17:26:42,247] [ERROR] [paperless.consumer] Cannot consume /data/consume/20210118 - Github.pdf: File not found.
[2021-11-23 17:26:42,753] [ERROR] [paperless.consumer] Cannot consume /data/consume/20210118 - Github.pdf: File not found.
[2021-11-23 17:26:43,267] [ERROR] [paperless.consumer] Cannot consume /data/consume/20210118 - Github.pdf: File not found.
[2021-11-23 17:28:43,529] [ERROR] [paperless.consumer] Cannot consume /data/consume/20210718 - Github.pdf: File not found.
[2021-11-23 17:28:44,034] [ERROR] [paperless.consumer] Cannot consume /data/consume/20210718 - Github.pdf: File not found.

carpii avatar Nov 23 '21 19:11 carpii

Seeing this issue as well, although I thought it was because my consumer keeps dying (found this thread searching for submissions on that) necessitating a container restart. I was assuming that the directory polling on restart resulted in a duplicate task being created. I'd go into the failed tasks section of the admin panel to delete the tasks.

skorvek avatar Nov 23 '21 20:11 skorvek

Seeing this issue as well, although I thought it was because my consumer keeps dying (found this thread searching for submissions on that) necessitating a container restart. I was assuming that the directory polling on restart resulted in a duplicate task being created. I'd go into the failed tasks section of the admin panel to delete the tasks.

I'm not seeing anything to suggest my consumer keeps dying. I think yours might be a separate issue, but with similar symptoms. Im a brand new user, and literally the first PDF i copy into consume creates a bunch of identical tasks.

carpii avatar Nov 23 '21 21:11 carpii

Got my issues fixed. Problem was that my scanner was not scanning multiple pages into an internal storage and then moving the complete file to the consumption directory, but instead created the document after scanning the first page in the consumption directory and then modifying this file for every additional page. If paperless started consumption in between this above error occurred and I got unexpected results.

I "fixed" this by scanning into an intermediate folder and have a regular cron job running every minute which checks for new files and checks that the files have not been modified for the last 30 seconds. If this is the case (no additional pages will be added) the file is moved to the actually consumer directory.

Here's the script, maybe it helps someone:

#!/bin/bash
SOURCE_DIR=/volume1/share/paperless/inbox
TARGET_DIR=/volume1/docker/paperless-consume
INTERVAL_IN_SECONDS=30

now=$(date +"%s")

for file in $SOURCE_DIR/*; do
	if [ -f $file ]; then
	        filetime=$(date -r $file +"%s")
		timediff=$(expr $now - $filetime)
		if [ $timediff -ge $INTERVAL_IN_SECONDS ]; then
			filename=${file##*/}
			newfilename="${filetime}_${filename}"
			mv $file $TARGET_DIR/$newfilename
		fi
	fi
done

AceTheFace avatar Jan 13 '22 13:01 AceTheFace

The same problem...

here my log

[2022-02-28 08:39:11,569] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/Cedolino stipendio.pdf to the task queue.
[2022-02-28 08:39:11,664] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/Cedolino stipendio.pdf to the task queue.
[2022-02-28 08:39:11,863] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/Cedolino stipendio.pdf to the task queue.
[2022-02-28 08:39:11,907] [INFO] [paperless.consumer] Consuming Cedolino stipendio.pdf
[2022-02-28 08:39:11,907] [INFO] [paperless.consumer] Consuming Cedolino stipendio.pdf
[2022-02-28 08:39:11,907] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/Cedolino stipendio.pdf to the task queue.
[2022-02-28 08:39:11,922] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2022-02-28 08:39:11,922] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2022-02-28 08:39:11,927] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/src/../consume/Cedolino stipendio.pdf to the task queue.
[2022-02-28 08:39:11,957] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2022-02-28 08:39:11,957] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2022-02-28 08:39:11,963] [DEBUG] [paperless.consumer] Parsing Cedolino stipendio.pdf...
[2022-02-28 08:39:11,963] [DEBUG] [paperless.consumer] Parsing Cedolino stipendio.pdf...
[2022-02-28 08:39:12,561] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /usr/src/paperless/src/../consume/Cedolino stipendio.pdf
[2022-02-28 08:39:12,561] [DEBUG] [paperless.parsing.tesseract] Extracted text from PDF file /usr/src/paperless/src/../consume/Cedolino stipendio.pdf
[2022-02-28 08:39:12,889] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/usr/src/paperless/src/../consume/Cedolino stipendio.pdf', 'output_file': '/tmp/paperless/paperless-h3j8i25o/archive.pdf', 'use_threads': True, 'jobs': 2, 'language': 'eng', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-h3j8i25o/sidecar.txt'}
[2022-02-28 08:39:12,889] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': '/usr/src/paperless/src/../consume/Cedolino stipendio.pdf', 'output_file': '/tmp/paperless/paperless-jifc8ob6/archive.pdf', 'use_threads': True, 'jobs': 2, 'language': 'eng', 'output_type': 'pdfa', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': '/tmp/paperless/paperless-jifc8ob6/sidecar.txt'}
[2022-02-28 08:39:13,668] [WARNING] [paperless.parsing.tesseract] This file is encrypted, OCR is impossible. Using any text present in the original file.
[2022-02-28 08:39:13,669] [DEBUG] [paperless.consumer] Generating thumbnail for Cedolino stipendio.pdf...
[2022-02-28 08:39:13,669] [WARNING] [paperless.parsing.tesseract] This file is encrypted, OCR is impossible. Using any text present in the original file.
[2022-02-28 08:39:13,670] [DEBUG] [paperless.consumer] Generating thumbnail for Cedolino stipendio.pdf...
[2022-02-28 08:39:13,677] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /usr/src/paperless/src/../consume/Cedolino stipendio.pdf[0] /tmp/paperless/paperless-jifc8ob6/convert.png
[2022-02-28 08:39:13,678] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient /usr/src/paperless/src/../consume/Cedolino stipendio.pdf[0] /tmp/paperless/paperless-h3j8i25o/convert.png
[2022-02-28 08:39:14,641] [DEBUG] [paperless.parsing.tesseract] Execute: optipng -silent -o5 /tmp/paperless/paperless-jifc8ob6/convert.png -out /tmp/paperless/paperless-jifc8ob6/thumb_optipng.png
[2022-02-28 08:39:14,655] [DEBUG] [paperless.parsing.tesseract] Execute: optipng -silent -o5 /tmp/paperless/paperless-h3j8i25o/convert.png -out /tmp/paperless/paperless-h3j8i25o/thumb_optipng.png
[2022-02-28 08:39:20,841] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2022-02-28 08:39:20,850] [DEBUG] [paperless.consumer] Saving record to database
[2022-02-28 08:39:20,918] [DEBUG] [paperless.matching] DocumentType Cedolini matched on document 1991-05-22 Cedolino stipendio because it contains this word: retribuzione
[2022-02-28 08:39:20,919] [INFO] [paperless.handlers] Assigning document type Cedolini to 1991-05-22 Cedolino stipendio
[2022-02-28 08:39:20,925] [DEBUG] [paperless.matching] Tag stipendio matched on document 1991-05-22 Cedolino stipendio because it contains this word: retribuzione
[2022-02-28 08:39:20,926] [INFO] [paperless.handlers] Tagging "1991-05-22 Cedolino stipendio" with "stipendio"
[2022-02-28 08:39:21,005] [DEBUG] [paperless.consumer] Deleting file /usr/src/paperless/src/../consume/Cedolino stipendio.pdf
[2022-02-28 08:39:21,028] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-h3j8i25o
[2022-02-28 08:39:21,029] [INFO] [paperless.consumer] Document 1991-05-22 Cedolino stipendio consumption finished
[2022-02-28 08:39:21,253] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2022-02-28 08:39:21,267] [ERROR] [paperless.consumer] The following error occured while consuming Cedolino stipendio.pdf: [Errno 2] No such file or directory: '/usr/src/paperless/src/../consume/Cedolino stipendio.pdf'
Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/consumer.py", line 287, in try_consume_file
    document = self._store(
  File "/usr/src/paperless/src/documents/consumer.py", line 372, in _store
    stats = os.stat(self.path)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/src/paperless/src/../consume/Cedolino stipendio.pdf'
[2022-02-28 08:39:21,272] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-jifc8ob6
[2022-02-28 08:39:21,479] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/Cedolino stipendio.pdf: File not found.
[2022-02-28 08:39:21,983] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/Cedolino stipendio.pdf: File not found.
[2022-02-28 08:39:22,477] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/src/../consume/Cedolino stipendio.pdf: File not found.

mada199122 avatar Feb 28 '22 07:02 mada199122

Just saw the release notes for 1.10. and was wondering, if https://github.com/paperless-ngx/paperless-ngx/pull/1905 is fixing the issue here? Does anybody know/understand if this is the "same" issue? Because I was running in the situation of lost pdfs and am very interested in a fixed release.

DerHexer avatar Nov 28 '22 16:11 DerHexer