pdfx issues

added sorting option by using list

whenever I used `references = pdf.get_references_as_dict(sort=True)` it would fail saying: ``` File "C:\Users\user\Scripts\PDFx\test.py", line 9, in references = pdf.get_references_as_dict(sort=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\users\user\scripts\pdfx\pdfx\pdfx\__init__.py", line 168, in get_references_as_dict return self.reader.get_references_as_dict(reftype=reftype, sort=sort) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

Masterjx9

timeout option

2

Hi, pdfx is very helpful for us to analyze a few things. Thanks for creating pdfx. But we have a small problem. When a pdf file contains much text pdfx...

DanielRuf

Fixed handling of multi-line link extraction

Improved the extract_links function to include hyperlinks spanning over two or more lines by replacing line breaks in text (issue #40)

maximiliancw

Cuts off links that span two lines

3

Links that span spill over onto the second line are cut off when being recognized and thus reported as dead.

marshalmiller

Create SECURITY.xml

Global Infrastructure Hosting Platform

1989shack

Point pdf links to local files downloaded - feature request

1

Is there any possibility that the original pdf file be modified to make the original link to point to the locally downloaded files? A second, more interesting option would be...

maguilella

fix: support charset ISO-8859-1, closes #48

2

closes #48

Helias

pdfx
pdfx copied to clipboard

Metadata

added sorting option by using list

timeout option

Fixed handling of multi-line link extraction

Cuts off links that span two lines

Create SECURITY.xml

Point pdf links to local files downloaded - feature request

fix: support charset ISO-8859-1, closes #48

Recursive URL extraction from PDFs - feature request

Detect metadata from Arxiv Documents

Adding Timeout CLI parameter

← Metadata

Owner

Metadata

pdfx pdfx copied to clipboard

Metadata

← Metadata

Owner

Metadata

pdfx
pdfx copied to clipboard