organize icon indicating copy to clipboard operation
organize copied to clipboard

'charmap' codec can't decode

Open Cinux90 opened this issue 3 years ago • 1 comments

Hi All,

while starting to use organize I will setup my rules before execute it on productive files. Therefore I started to set up a easy config file:

rules:
  - folders: ~/tmp_doc-test
    subfolders: false
    filters:
      - extension: pdf 
      - filecontent: "Entgeltbescheinigung"
    actions:
      - echo: "Found PDF!"
      - copy: "~/tmp_doc-test/sortiert/Lohnzettel/"

And execute it as usual:

organize run

For some files i face Following issues:

  File BWG.pdf:
    - (FileContent) ERROR! 'charmap' codec can't decode byte 0x9d in position 9796: character maps to <undefined>

I tried to resolve this issue but i have no Idea about the reason. First of all I was thinking that's because the files charset is of type binary:

file -i BWG.pdf
BWG.pdf: application/pdf; charset=binary

But I have also other PDF files with charset binary

So I'm completely out of ideas. Someone of you has any idea?

Cinux90 avatar Dec 08 '21 16:12 Cinux90

organize uses textract under the hood. So you might check the output of:

textract file.pdf

You can also try installing another parser which is supported by textract:

pip install pdftotext

tfeldmann avatar Jan 28 '22 09:01 tfeldmann