pdftitle
pdftitle copied to clipboard
a utility to extract the title from a PDF file
The algorithm only considered the case when the title text is in the top level, while in many pdf files, the title is indeed inside a XForm or a multi-level...
Currently, the program may output digraph for certain PDFs. For example https://arxiv.org/pdf/1506.02640.pdf . ```bash $ pdftitle -p 1506.02640.pdf You Only Look Once: Unified, Real-Time Object Detection ``` Note the `fi`...
In the implementation of the "eliot" algorithm, the y coordinates are sorted low-to-high: https://github.com/metebalci/pdftitle/blob/5ebc1a0ec3f347e5a257485bc6ce43a9f12798ba/pdftitle.py#L543-L548 Since the origin of a pdf is the bottom-left corner, the y coordinates should be sorted...
is it possible to create an informative error message instead of application crash. ``` Traceback (most recent call last): File "/home/zk/.local/lib/python3.9/site-packages/pdftitle.py", line 701, in run title = get_title_from_file(args.pdf) File "/home/zk/.local/lib/python3.9/site-packages/pdftitle.py",...
I think this would help the user to discover the option. Currently the error message just say "PDF contains a unicode char that does not exist in the font", maybe...
I might have overlooked something, but it seems there is no way to adjust the parameters from API calls, e.g. you can't call `get_title_from_file(path, algo='max2')`.
Fixes #33 This should be the most minimal-invasive way of passing arguments to pdftitle when calling it from another python module. It allows the module to be used in conjunction...
- Improve fixing spaces when seeing similar consecutive characters - Add argument to force fixing spaces - Strip possible newlines from end result
Text in the PDF file might not contain space character but the space might be indicated with an actual (additional) horizontal position difference between the glyphs before and after the...
To make this repository more contribution friendly it should imho be structured in a "standard way", i.e. the top level directory only containing `setup.py` but not the actual source code....