pdf2archive icon indicating copy to clipboard operation
pdf2archive copied to clipboard

Ghostscript crashes when writing XMP metadata with Asian characters

Open matteosecli opened this issue 6 years ago • 0 comments

It appears that GS crashes anyway, but 9.14produces a valid PDF/A-1B result while 9.22 does not. In particular, version 9.14 copies the correct character in the XMP metadata while 9.22 does not (causing the validation failure).

TODO:

  • [ ] Report to the GS guys.
  • [ ] Check for possible solutions from my side.

Original file: 1309.4626.pdf

Converted file with metadata preservation (gs 9.22): 1309.4626-PDFA.pdf GS crashes and the file is not PDF/A-1B compliant but it's a valid PDF. The XMP metadata is not correct.

Converted file with metadata preservation (gs 9.14): 1309.4626-PDFA_914.pdf GS crashes but the file is a valid PDF/A-1B. The XMP metadata is correct.

Converted file with metadata reset (--cleanmetadata, gs 9.22) 1309.4626-PDFA_clean.pdf GS does not crashes and the file is a valid PDF/A-1B.

Conversion output: (click to show)

$ ./pdf2archive --debug --validate 1309.4626.pdf 
=== Welcome to PDF2ARCHIVE ===
  DEBUG: running PDF2ARCHIVE, version 0.3
  DEBUG: using Ghostscript binary at /usr/local/bin/gs, version 9.22
  DEBUG: the input file is '1309.4626.pdf'
  DEBUG: the output file is '1309.4626-PDFA.pdf'
  DEBUG: the intermediate processing file is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.8nfVxnqp
  DEBUG: the temporary directory is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.hmUdqwki
  DEBUG: the current quality options are ''
  DEBUG: PDF title ''
  DEBUG: PDF author 'Md. Mohi Uddin'
  DEBUG: PDF subject ''
  DEBUG: PDF keywords ''
  DEBUG: PDF creator 'Word u( Acrobat PDFMaker 8.1'
  DEBUG: PDF producer 'Acrobat Distiller 8.1.0 (Windows)'
  DEBUG: PDF creation date 'D:20130917195031+09'00''
  DEBUG: PDF modification date 'D:20130917195112+09'00''
  DEBUG: PDF trapping ''
  Creating the definition file...
  Compressing PDF & embedding fonts...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Processing pages 1 through 12.
Page 1
Querying operating system for font files...
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
Loading NimbusRoman-Bold font from /usr/local/Cellar/ghostscript/9.22/share/ghostscript/9.22/Resource/Font/NimbusRoman-Bold... 5088616 3528885 2680024 1311425 2 done.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 2
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 3
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 4
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 5
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 6
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 7
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 8
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 9
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 10
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 11
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 12
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

  Converting to PDF/A-1B...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Processing pages 1 through 12.
Page 1
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 2
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 3
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 4
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 5
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 6
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 7
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 8
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 9
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 10
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 11
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 12
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Error: /syntaxerror in -file-
Operand stack:
   --nostringval--   Title   ()   Author   (Md. Mohi Uddin)   Subject   ()   Keywords   ()   Creator
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1999   1   3   %oparray_pop   1998   1   3   %oparray_pop   1982   1   3   %oparray_pop   1868   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push
Dictionary stack:
   --dict:987/1684(ro)(G)--   --dict:0/20(G)--   --dict:79/200(L)--
Current allocation mode is local
Current file position is 869
GPL Ghostscript 9.22: Unrecoverable error, exit code 1
  Removing temporary files...
  Done, now ESSE3 is happy! ;)
  Validating resulting file...
  FAIL /Users/matteo/Dropbox/UNI/MScThesis/Elaborato/LaTeX/pdf2archive/1309.4626-PDFA.pdf
  FAIL 6.2.3-2
  FAIL 6.2.3-4
  FAIL 6.7.3-1

Original file metadata:

$ exiftool -a -G1 1309.4626.pdf 
[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626.pdf
[System]        Directory                       : .
[System]        File Size                       : 519 kB
[System]        File Modification Date/Time     : 2017:11:02 17:02:04+01:00
[System]        File Access Date/Time           : 2018:03:12 15:23:07+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:23:01+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : Yes
[PDF]           Tagged PDF                      : Yes
[PDF]           Page Count                      : 12
[PDF]           Page Layout                     : OneColumn
[PDF]           Create Date                     : 2013:09:17 19:50:31+09:00
[PDF]           Author                          : Md. Mohi Uddin
[PDF]           Creator                         : Word 用 Acrobat PDFMaker 8.1
[PDF]           Producer                        : Acrobat Distiller 8.1.0 (Windows)
[PDF]           Modify Date                     : 2013:09:17 19:51:12+09:00
[PDF]           Source Modified                 : D:20130917104643
[PDF]           Title                           : 
[XMP-x]         XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
[XMP-pdf]       Producer                        : Acrobat Distiller 8.1.0 (Windows)
[XMP-pdfx]      Source Modified                 : D:20130917104643
[XMP-xmp]       Create Date                     : 2013:09:17 19:50:31+09:00
[XMP-xmp]       Creator Tool                    : Word 用 Acrobat PDFMaker 8.1
[XMP-xmp]       Modify Date                     : 2013:09:17 19:51:12+09:00
[XMP-xmp]       Metadata Date                   : 2013:09:17 19:51:12+09:00
[XMP-xmpMM]     Document ID                     : uuid:c1155296-9b86-4283-b549-b3f53693a7dc
[XMP-xmpMM]     Instance ID                     : uuid:acf3e0f9-3b0e-40d0-8ae6-c0c5d27a48c6
[XMP-xmpMM]     Subject                         : 21
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Creator                         : Md. Mohi Uddin
[XMP-dc]        Title                           : 

Converted file metadata (notice the unknown character in the XMP metadata; the Info dictionary, instead, is preserved correctly):

$ exiftool -a -G1 1309.4626-PDFA.pdf 
[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626-PDFA.pdf
[System]        Directory                       : .
[System]        File Size                       : 441 kB
[System]        File Modification Date/Time     : 2018:03:12 15:23:45+01:00
[System]        File Access Date/Time           : 2018:03:12 15:23:48+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:23:45+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : No
[PDF]           Page Count                      : 12
[PDF]           Producer                        : GPL Ghostscript 9.22
[PDF]           Create Date                     : 2018:03:12 15:23:43+01:00
[PDF]           Modify Date                     : 2018:03:12 15:23:43+01:00
[PDF]           Author                          : Md. Mohi Uddin
[PDF]           Creator                         : Word 用 Acrobat PDFMaker 8.1
[PDF]           Title                           : 
[XMP-x]         XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf]       Producer                        : GPL Ghostscript 9.22
[XMP-xmp]       Modify Date                     : 2018:03:12 15:23:43+01:00
[XMP-xmp]       Create Date                     : 2018:03:12 15:23:43+01:00
[XMP-xmp]       Creator Tool                    : Word � Acrobat PDFMaker 8.1
[XMP-xmpMM]     Document ID                     : uuid:844b5246-5e1d-11f3-0000-eb0c71bba29b
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Title                           : 
[XMP-dc]        Creator                         : Md. Mohi Uddin
[XMP-pdfaid]    Part                            : 1
[XMP-pdfaid]    Conformance                     : B

By using

./pdf2archive --cleanmetadata --title="用" --debug --validate 1309.4626.pdf 

GS does not crashes, but the resulting file is still not a PDF/A-1B valid file. I suspect this is because I cannot really write Asian characters in the terminal (I have the wrong locale), or maybe they are just not parsed correctly. In fact, the result is:

[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626-PDFA.pdf
[System]        Directory                       : .
[System]        File Size                       : 442 kB
[System]        File Modification Date/Time     : 2018:03:12 15:35:49+01:00
[System]        File Access Date/Time           : 2018:03:12 15:35:52+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:35:49+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : No
[PDF]           Page Count                      : 12
[PDF]           Producer                        : GPL Ghostscript 9.22
[PDF]           Create Date                     : 2018:03:12 15:35:47+01:00
[PDF]           Modify Date                     : 2018:03:12 15:35:47+01:00
[PDF]           Author                          : 
[PDF]           Creator                         : 
[PDF]           Title                           : çfl¨
[PDF]           Subject                         : 
[PDF]           Trapped                         : 
[XMP-x]         XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf]       Producer                        : GPL Ghostscript 9.22
[XMP-pdf]       Keywords                        : 
[XMP-xmp]       Modify Date                     : 2018:03:12 15:35:47+01:00
[XMP-xmp]       Create Date                     : 2018:03:12 15:35:47+01:00
[XMP-xmp]       Creator Tool                    : 
[XMP-xmpMM]     Document ID                     : uuid:33d4f44d-5e1f-11f3-0000-eb0c71bba29b
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Title                           : �
[XMP-dc]        Creator                         : 
[XMP-dc]        Description                     : 
[XMP-pdfaid]    Part                            : 1
[XMP-pdfaid]    Conformance                     : B

matteosecli avatar Mar 12 '18 14:03 matteosecli