organize
organize copied to clipboard
'charmap' codec can't decode
Hi All,
while starting to use organize I will setup my rules before execute it on productive files. Therefore I started to set up a easy config file:
rules:
- folders: ~/tmp_doc-test
subfolders: false
filters:
- extension: pdf
- filecontent: "Entgeltbescheinigung"
actions:
- echo: "Found PDF!"
- copy: "~/tmp_doc-test/sortiert/Lohnzettel/"
And execute it as usual:
organize run
For some files i face Following issues:
File BWG.pdf:
- (FileContent) ERROR! 'charmap' codec can't decode byte 0x9d in position 9796: character maps to <undefined>
I tried to resolve this issue but i have no Idea about the reason. First of all I was thinking that's because the files charset is of type binary:
file -i BWG.pdf
BWG.pdf: application/pdf; charset=binary
But I have also other PDF files with charset binary
So I'm completely out of ideas. Someone of you has any idea?
organize uses textract under the hood. So you might check the output of:
textract file.pdf
You can also try installing another parser which is supported by textract:
pip install pdftotext