pdfquery icon indicating copy to clipboard operation
pdfquery copied to clipboard

windows only: pdfquery is locking the opended pdf-file

Open iconberg opened this issue 5 years ago • 1 comments

I try open pdfs files to query data from it and then use that data to rename the pdf-file. On windows this code fails with renaming cause the file is locked. On linux the code is working.

I cannot see if this error belongs to pdfquery itself or an other module used by pdfquery is causing this.

import os
import pdfquery


def is_pdf(file):
    if os.path.splitext(file.lower())[1] == '.pdf':
        return True


pdf_files = os.listdir('./pages')
for pdf_file in filter(is_pdf, pdf_files):
    print(pdf_file)
    pdf = pdfquery.PDFQuery(os.path.join('pages', pdf_file))
    pdf.load()
    for e in pdf.tree.iter():
        text = e.text
        if text:
            text = text.replace(' ', '')
            if text[0:7] == '4002629':
                #del pdf
                os.rename(os.path.join('pages', pdf_file),
                          '{}.pdf'.format(text))
                break

Error on windows:

Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\PDFs_aufbereiten\pdf_pages_rename.py", line 22, in <module>
    os.rename(os.path.join('pages', pdf_file), '{}.pdf'.format(text))
PermissionError: [WinError 32] Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird: 'pages\\xxxxxxxxxxxxxxxxxxxx.pdf' -> 'xxxxxxxxxxxxx.pdf'

Code on linux is working.

iconberg avatar May 10 '19 17:05 iconberg

Workaround open/close the file by own code before using pdfquery.PDFQuery (thanks to nedbat):

import os
import pdfquery
import time

def is_pdf(file):
    if os.path.splitext(file.lower())[1] == '.pdf':
        return True


rename_files = []
pdf_files = os.listdir('./pages')
for pdf_file in filter(is_pdf, pdf_files):
    print(pdf_file)
    with open(os.path.join('pages', pdf_file), 'rb') as myfile:
        pdf = pdfquery.PDFQuery(myfile)
        pdf.load()
        for e in pdf.tree.iter():
            text = e.text
            if text:
                text = text.replace(' ', '')
                if text[0:7] == '4002629':
                    rename_files.append(
                        (pdf_file, '{}.pdf'.format(text))
                    )
                    break

for oldname, newname in rename_files:
    os.rename(os.path.join('pages', oldname),
              os.path.join('pages', newname)
              )

iconberg avatar May 10 '19 20:05 iconberg