pdf2archive
pdf2archive copied to clipboard
Ghostscript crashes when writing XMP metadata with Asian characters
It appears that GS crashes anyway, but 9.14
produces a valid PDF/A-1B result while 9.22
does not. In particular, version 9.14
copies the correct character in the XMP metadata while 9.22
does not (causing the validation failure).
TODO:
- [ ] Report to the GS guys.
- [ ] Check for possible solutions from my side.
Original file: 1309.4626.pdf
Converted file with metadata preservation (gs 9.22
):
1309.4626-PDFA.pdf
GS crashes and the file is not PDF/A-1B compliant but it's a valid PDF. The XMP metadata is not correct.
Converted file with metadata preservation (gs 9.14
):
1309.4626-PDFA_914.pdf
GS crashes but the file is a valid PDF/A-1B. The XMP metadata is correct.
Converted file with metadata reset (--cleanmetadata
, gs 9.22
)
1309.4626-PDFA_clean.pdf
GS does not crashes and the file is a valid PDF/A-1B.
Conversion output: (click to show)
$ ./pdf2archive --debug --validate 1309.4626.pdf
=== Welcome to PDF2ARCHIVE ===
DEBUG: running PDF2ARCHIVE, version 0.3
DEBUG: using Ghostscript binary at /usr/local/bin/gs, version 9.22
DEBUG: the input file is '1309.4626.pdf'
DEBUG: the output file is '1309.4626-PDFA.pdf'
DEBUG: the intermediate processing file is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.8nfVxnqp
DEBUG: the temporary directory is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.hmUdqwki
DEBUG: the current quality options are ''
DEBUG: PDF title ''
DEBUG: PDF author 'Md. Mohi Uddin'
DEBUG: PDF subject ''
DEBUG: PDF keywords ''
DEBUG: PDF creator 'Word u( Acrobat PDFMaker 8.1'
DEBUG: PDF producer 'Acrobat Distiller 8.1.0 (Windows)'
DEBUG: PDF creation date 'D:20130917195031+09'00''
DEBUG: PDF modification date 'D:20130917195112+09'00''
DEBUG: PDF trapping ''
Creating the definition file...
Compressing PDF & embedding fonts...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Processing pages 1 through 12.
Page 1
Querying operating system for font files...
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
Loading NimbusRoman-Bold font from /usr/local/Cellar/ghostscript/9.22/share/ghostscript/9.22/Resource/Font/NimbusRoman-Bold... 5088616 3528885 2680024 1311425 2 done.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 2
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 3
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 4
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 5
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 6
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 7
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 8
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 9
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 10
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 11
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 12
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Converting to PDF/A-1B...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Processing pages 1 through 12.
Page 1
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 2
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 3
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 4
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 5
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 6
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 7
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 8
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 9
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 10
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 11
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Page 12
GPL Ghostscript 9.22:
Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.
Error: /syntaxerror in -file-
Operand stack:
--nostringval-- Title () Author (Md. Mohi Uddin) Subject () Keywords () Creator
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1999 1 3 %oparray_pop 1998 1 3 %oparray_pop 1982 1 3 %oparray_pop 1868 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push
Dictionary stack:
--dict:987/1684(ro)(G)-- --dict:0/20(G)-- --dict:79/200(L)--
Current allocation mode is local
Current file position is 869
GPL Ghostscript 9.22: Unrecoverable error, exit code 1
Removing temporary files...
Done, now ESSE3 is happy! ;)
Validating resulting file...
FAIL /Users/matteo/Dropbox/UNI/MScThesis/Elaborato/LaTeX/pdf2archive/1309.4626-PDFA.pdf
FAIL 6.2.3-2
FAIL 6.2.3-4
FAIL 6.7.3-1
Original file metadata:
$ exiftool -a -G1 1309.4626.pdf
[ExifTool] ExifTool Version Number : 10.80
[System] File Name : 1309.4626.pdf
[System] Directory : .
[System] File Size : 519 kB
[System] File Modification Date/Time : 2017:11:02 17:02:04+01:00
[System] File Access Date/Time : 2018:03:12 15:23:07+01:00
[System] File Inode Change Date/Time : 2018:03:12 15:23:01+01:00
[System] File Permissions : rw-r--r--
[File] File Type : PDF
[File] File Type Extension : pdf
[File] MIME Type : application/pdf
[PDF] PDF Version : 1.4
[PDF] Linearized : Yes
[PDF] Tagged PDF : Yes
[PDF] Page Count : 12
[PDF] Page Layout : OneColumn
[PDF] Create Date : 2013:09:17 19:50:31+09:00
[PDF] Author : Md. Mohi Uddin
[PDF] Creator : Word 用 Acrobat PDFMaker 8.1
[PDF] Producer : Acrobat Distiller 8.1.0 (Windows)
[PDF] Modify Date : 2013:09:17 19:51:12+09:00
[PDF] Source Modified : D:20130917104643
[PDF] Title :
[XMP-x] XMP Toolkit : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
[XMP-pdf] Producer : Acrobat Distiller 8.1.0 (Windows)
[XMP-pdfx] Source Modified : D:20130917104643
[XMP-xmp] Create Date : 2013:09:17 19:50:31+09:00
[XMP-xmp] Creator Tool : Word 用 Acrobat PDFMaker 8.1
[XMP-xmp] Modify Date : 2013:09:17 19:51:12+09:00
[XMP-xmp] Metadata Date : 2013:09:17 19:51:12+09:00
[XMP-xmpMM] Document ID : uuid:c1155296-9b86-4283-b549-b3f53693a7dc
[XMP-xmpMM] Instance ID : uuid:acf3e0f9-3b0e-40d0-8ae6-c0c5d27a48c6
[XMP-xmpMM] Subject : 21
[XMP-dc] Format : application/pdf
[XMP-dc] Creator : Md. Mohi Uddin
[XMP-dc] Title :
Converted file metadata (notice the unknown character in the XMP metadata; the Info dictionary, instead, is preserved correctly):
$ exiftool -a -G1 1309.4626-PDFA.pdf
[ExifTool] ExifTool Version Number : 10.80
[System] File Name : 1309.4626-PDFA.pdf
[System] Directory : .
[System] File Size : 441 kB
[System] File Modification Date/Time : 2018:03:12 15:23:45+01:00
[System] File Access Date/Time : 2018:03:12 15:23:48+01:00
[System] File Inode Change Date/Time : 2018:03:12 15:23:45+01:00
[System] File Permissions : rw-r--r--
[File] File Type : PDF
[File] File Type Extension : pdf
[File] MIME Type : application/pdf
[PDF] PDF Version : 1.4
[PDF] Linearized : No
[PDF] Page Count : 12
[PDF] Producer : GPL Ghostscript 9.22
[PDF] Create Date : 2018:03:12 15:23:43+01:00
[PDF] Modify Date : 2018:03:12 15:23:43+01:00
[PDF] Author : Md. Mohi Uddin
[PDF] Creator : Word 用 Acrobat PDFMaker 8.1
[PDF] Title :
[XMP-x] XMP Toolkit : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf] Producer : GPL Ghostscript 9.22
[XMP-xmp] Modify Date : 2018:03:12 15:23:43+01:00
[XMP-xmp] Create Date : 2018:03:12 15:23:43+01:00
[XMP-xmp] Creator Tool : Word � Acrobat PDFMaker 8.1
[XMP-xmpMM] Document ID : uuid:844b5246-5e1d-11f3-0000-eb0c71bba29b
[XMP-dc] Format : application/pdf
[XMP-dc] Title :
[XMP-dc] Creator : Md. Mohi Uddin
[XMP-pdfaid] Part : 1
[XMP-pdfaid] Conformance : B
By using
./pdf2archive --cleanmetadata --title="用" --debug --validate 1309.4626.pdf
GS does not crashes, but the resulting file is still not a PDF/A-1B valid file. I suspect this is because I cannot really write Asian characters in the terminal (I have the wrong locale), or maybe they are just not parsed correctly. In fact, the result is:
[ExifTool] ExifTool Version Number : 10.80
[System] File Name : 1309.4626-PDFA.pdf
[System] Directory : .
[System] File Size : 442 kB
[System] File Modification Date/Time : 2018:03:12 15:35:49+01:00
[System] File Access Date/Time : 2018:03:12 15:35:52+01:00
[System] File Inode Change Date/Time : 2018:03:12 15:35:49+01:00
[System] File Permissions : rw-r--r--
[File] File Type : PDF
[File] File Type Extension : pdf
[File] MIME Type : application/pdf
[PDF] PDF Version : 1.4
[PDF] Linearized : No
[PDF] Page Count : 12
[PDF] Producer : GPL Ghostscript 9.22
[PDF] Create Date : 2018:03:12 15:35:47+01:00
[PDF] Modify Date : 2018:03:12 15:35:47+01:00
[PDF] Author :
[PDF] Creator :
[PDF] Title : çfl¨
[PDF] Subject :
[PDF] Trapped :
[XMP-x] XMP Toolkit : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf] Producer : GPL Ghostscript 9.22
[XMP-pdf] Keywords :
[XMP-xmp] Modify Date : 2018:03:12 15:35:47+01:00
[XMP-xmp] Create Date : 2018:03:12 15:35:47+01:00
[XMP-xmp] Creator Tool :
[XMP-xmpMM] Document ID : uuid:33d4f44d-5e1f-11f3-0000-eb0c71bba29b
[XMP-dc] Format : application/pdf
[XMP-dc] Title : �
[XMP-dc] Creator :
[XMP-dc] Description :
[XMP-pdfaid] Part : 1
[XMP-pdfaid] Conformance : B