reach Extracting wrong title in PDF metadata

Extracting wrong title in PDF metadata

Open TeriForey opened this issue 5 years ago • 1 comments

Noticed that some PDFs have a second title within the metadata, for example:

Title:          SEAJPH-April 13.indb
Creator:        Adobe InDesign CS3 (5.0)
Producer:       Adobe PDF Library 8.0
CreationDate:   Mon May 20 04:11:26 2013 IST
ModDate:        Mon May 20 04:11:27 2013 IST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          11
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      616434 bytes
Optimized:      no
PDF version:    1.3
PDF subtype:    PDF/X-3:2002
    Title:         ISO 15930 - Electronic document file format for prepress digital data exchange (PDF/X)
    Abbreviation:  PDF/X-3:2002
    Subtitle:      Part 3: Complete exchange suitable for colour-managed workflows (PDF/X-3)
    Standard:      ISO 15930-3

The latter one, which just describes that it's a PDF, is getting returned instead of the first.

Jan 22 '20 09:01 TeriForey

I don't think we'll deal with this at the moment, this code may actually get strippped out as the majority of titles we seem to be getting pretty consistently from the source page on the target site, if that's the case that was the predominant reason for getting this information from the PDF metadata, marking as wontfix while we evaluate if any of the rest of this information would be useful.

Feb 03 '20 11:02 jdu

reach reach copied to clipboard

Extracting wrong title in PDF metadata

reach
reach copied to clipboard