rmrl
rmrl copied to clipboard
merge_pages() recursively searches for page size (fixes #11)
Spent a bit looking into this on my own before realizing someone had already figured it out in #11.
Using the following recursive dict search mostly from here on a PDF of a random textbook I have:
def _finditem(obj, key, path=None):
if path is None:
path = []
if key in obj:
print(f"key {key} found at path {path}")
return obj[key]
for k, v in obj.items():
if isinstance(v,dict):
item = _finditem(v, key, path + [k])
if item is not None:
return item
print(_finditem(basepage, '/MediaBox')
The output looks like:
key /MediaBox found at path ['/Parent', '/Parent', '/Parent', '/Parent', '/Parent']
['0', '0', '612', '792']
The output matches what pdfinfo
finds:
(rmrl-WvqN329U-py3.8) rmrl-WvqN329U-py3 λ › pdfinfo -box /mnt/d/remarkable_sync/5cf892dc-6471-430c-9c75-7e83867f5eab.pdf git_WSL/rmrl fix_pagesize
Title: Mastering STM32
Author: Carmine Noviello
Creator: LaTeX with hyperref package
Producer: XeTeX 0.99999
CreationDate: Fri Aug 17 06:35:42 2018 PDT
ModDate: Tue Jun 11 03:03:22 2019 PDT
Tagged: no
UserProperties: no
Suspects: no
Form: AcroForm
JavaScript: no
Pages: 852
Encrypted: no
Page size: 612 x 792 pts (letter)
Page rot: 0
MediaBox: 0.00 0.00 612.00 792.00
CropBox: 0.00 0.00 612.00 792.00
BleedBox: 0.00 0.00 612.00 792.00
TrimBox: 0.00 0.00 612.00 792.00
ArtBox: 0.00 0.00 612.00 792.00
File size: 38994922 bytes
Optimized: no
PDF version: 1.5