acmart icon indicating copy to clipboard operation
acmart copied to clipboard

[enhancement] Add color rendering annotations for PDF/A conformance

Open zackw opened this issue 3 years ago • 12 comments

As of 1.78, simple acmart documents are almost fully PDF/A conformant if you turn PDF/A mode on in hyperref, except that neither hyperref nor hyperxmp currently adds color rendering annotations. Concretely, this simple document

\PassOptionsToPackage{pdfa,pdfapart=1,pdfaconformance=b}{hyperref}
\documentclass{acmart}
\begin{document}
\title{DeviceGray used without output rendering intent}
\author{Zack Weinberg}
\email{[email protected]}
\affiliation{\institution{Bug Finders Anonymous}\city{Pittsburgh}\state{PA}\country{USA}}
\maketitle
lorem ipsum dolor sit amet
\end{document}

makes verapdf spew errors about color rendering:

<?xml version="1.0" encoding="utf-8"?>
<report>
  <buildInformation>
    <releaseDetails id="core" version="1.16.1" buildDate="2020-05-12T00:43:00-04:00"></releaseDetails>
    <releaseDetails id="validation-model" version="1.16.1" buildDate="2020-05-12T00:46:00-04:00"></releaseDetails>
    <releaseDetails id="gui" version="1.16.1" buildDate="2020-05-12T00:59:00-04:00"></releaseDetails>
  </buildInformation>
  <jobs>
    <job>
      <item size="277030">
        <name>bad.pdf</name>
      </item>
      <validationReport profileName="PDF/A-1B validation profile" statement="PDF file is not compliant with Validation Profile requirements." isCompliant="false">
        <details passedRules="101" failedRules="2" passedChecks="2623" failedChecks="36">
          <rule specification="ISO 19005-1:2005" clause="6.5.3" testNumber="3" status="failed" passedChecks="0" failedChecks="3">
            <description>An annotation dictionary shall not contain the C array or the IC array unless the colour space of the
                        DestOutputProfile in the PDF/A-1 OutputIntent dictionary, defined in 6.2.2, is RGB</description>
            <object>PDAnnot</object>
            <test>(C_size == 0 &amp;&amp; IC_size == 0) || gOutputCS == "RGB "</test>
            <check status="failed">
              <context>root/document[0]/pages[0](6 0 obj PDPage)/annots[2](18 0 obj PDAnnot)</context>
            </check>
            <!-- more check blocks elided -->
          </rule>
          <rule specification="ISO 19005-1:2005" clause="6.2.3" testNumber="4" status="failed" passedChecks="0" failedChecks="33">
            <description>If an uncalibrated colour space is used in a file then that file shall contain a PDF/A-1 OutputIntent, as defined in 6.2.2</description>
            <object>PDDeviceGray</object>
            <test>gOutputCS != null</test>
            <check status="failed">
              <context>root/document[0]/pages[0](6 0 obj PDPage)/contentStream[0](11 0 obj PDContentStream)/operators[83]/colorSpace[0]</context>
            </check>
            <!-- many more check blocks elided -->
          </rule>
        </details>
      </validationReport>
      <duration start="1623338263750" finish="1623338264127">00:00:00.377</duration>
    </job>
  </jobs>
  <batchSummary totalJobs="1" failedToParse="0" encrypted="0">
    <validationReports compliant="0" nonCompliant="1" failedJobs="0">1</validationReports>
    <featureReports failedJobs="0">0</featureReports>
    <repairReports failedJobs="0">0</repairReports>
    <duration start="1623338263704" finish="1623338264141">00:00:00.437</duration>
  </batchSummary>
</report>

For this simple document, all of these errors can be fixed by adding this glob of magic to the preamble:

% From <https://tex.stackexchange.com/a/535754>.
\bgroup
\immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
\edef\iccobj{\the\pdflastobj}
\pdfcatalog{%
  /OutputIntents [
    <<
      /Type /OutputIntent
      /S /GTS_PDFA1
      /DestOutputProfile \iccobj\space 0 R
      /OutputConditionIdentifier (sRGB)
      /Info (sRGB)
    >>
  ]
}
\egroup

Documents with more complex use of color (either directly or via \includegraphics) might need more sophisticated handling.

zackw avatar Jun 10 '21 15:06 zackw

This code assumes you want RGB profile. It might break havoc with production.

borisveytsman avatar Jun 10 '21 17:06 borisveytsman

Yeah, I don't mean to suggest that acmart should adopt the "glob of magic" verbatim. pdfx seems to have thought this aspect of PDF/[AX] conformance through somewhat more than hyperxmp.

zackw avatar Jun 10 '21 17:06 zackw

As a side-but-related note, NSF now requires to upload camera-ready version sof papers whose work was funded by a NSF grant to a public-access repository (NSF PAR), and only accepts PDF/A format. IMHO, acmart should make it extremely easy to create such PDFs.

rionda avatar Sep 22 '21 18:09 rionda

(continuing from above). For example, I wonder whether the options pdfa,pdfapart=1,pdfaconformance=b to hyperref could be passed by default, which seems sufficient to satisfy the NSF PAR checker.

rionda avatar Sep 22 '21 18:09 rionda

Generating conformant PDF/A is a non-trivial task. Passing these values will make it look as if a document is PDF/A but most probably isnt and will generate a lot of complaints…

krono avatar Sep 22 '21 18:09 krono

I am worried about that. It can be solved with documentation about this fact. It is still a step in the right direction, IMHO.

rionda avatar Sep 22 '21 18:09 rionda

I generated a PDF/A from LaTeX, it was a three weeks endeavor. Although there is current work being done at https://github.com/latex3/pdfresources, I don't think this will be ready within the next 1.5 years or so.

To me, acmart seems not the right place to tackle this problem at this point in time…

krono avatar Sep 22 '21 18:09 krono

OK. =)

rionda avatar Sep 22 '21 21:09 rionda

Hi Tobias, Boris and others.

From: Tobias Pape @.> Date: 23 September 2021 at 4:49:16 am AEST To: borisveytsman/acmart @.> Cc: Subscribed @.> Subject: Re: [borisveytsman/acmart] [enhancement] Add color rendering annotations for PDF/A conformance (#435) Reply-To: borisveytsman/acmart @.>



I generated a PDF/A from LaTeX, it was a three weeks endeavor.

Is there a misunderstanding here? Were you trying to generate PDF/UA or PDF/A-1a rather than one of: PDF/A-1b, -2b, -2u, -3b, -3u .

This can be done almost instantaneously using \usepackage[

  1. meaningful MetaData (which is highly desirable anyway)
  2. ensuring all logos, images, coloured text all using the same Color Model.

It is #2 that the original posting is about. Read more about this in the 1st section of an article I wrote 3 years ago:

    https://www.tug.org/TUGboat/tb39-2/tb122moore-pdf.pdf

Any good graphics editing software allows you to make all your images use the same color model; e.g. just re-save as RGB rather than CMYK. then pdfx.sty will just do the right thing. I’ve been working this way for years.

Although there is current work being done at https://github.com/latex3/pdfresources https://github.com/latex3/pdfresources, I don't think this will be ready within the next 1.5 years or so.

This is more about Tagged PDF and PDF/UA, which is indeed a much harder thing to achieve.

To me, acmart seems not the right place to tackle this problem at this point in time…

Really? If NSF is requiring it, then there are many authors who will need to learn how to use techniques, such as pdfx provides. Doing things differently for articles directed at different publishing agencies is just not sustainable into the future.

The very last sentence in this recent article by a former ACM president sums it up. https://dl.acm.org/doi/pdf/10.1145/3483539 https://dl.acm.org/doi/pdf/10.1145/3483539

“...one would expect ACM to look to cutting-edge ideas for the benefit of its members.”

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/borisveytsman/acmart/issues/435#issuecomment-925203135, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYVIOICJQVYATEVUFVMULDUDIQKBANCNFSM46OYAT6Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

All the best.

Ross

Dr Ross Moore Department of Mathematics and Statistics 12 Wally’s Walk, Level 7, Room 734 Macquarie University, NSW 2109, Australia T: +61 2 9850 8955 | F: +61 2 9850 8114 M:+61 407 288 255 | E: @.*** http://www.maths.mq.edu.au

CRICOS Provider Number 00002J. Think before you print.  Please consider the environment before printing this email.

This message is intended for the addressee named and may  contain confidential information. If you are not the intended  recipient, please delete it and notify the sender. Views expressed  in this message are those of the individual sender, and are not  necessarily the views of Macquarie University. http://mq.edu.au/ CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email.

This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University. http://mq.edu.au/

ozross avatar Sep 22 '21 22:09 ozross

Hi Tobias, Boris and others.

Is there a misunderstanding here? Were you trying to generate PDF/UA or PDF/A-1a rather than one of: PDF/A-1b, -2b, -2u, -3b, -3u .

PDF/A-2u.

This can be done almost instantaneously using \usepackage[

pdfx turned out to be completely incompatible for me, as I had to use hyperxmp, and that did not like each other.

Also, an external xml file is a no-go for me.

  1. meaningful MetaData (which is highly desirable anyway)
  2. ensuring all logos, images, coloured text all using the same Color Model.

It is #2 that the original posting is about. Read more about this in the 1st section of an article I wrote 3 years ago: https://www.tug.org/TUGboat/tb39-2/tb122moore-pdf.pdf Any good graphics editing software allows you to make all your images use the same color model; e.g. just re-save as RGB rather than CMYK. then pdfx.sty will just do the right thing. I’ve been working this way for years.

Glad it worked out for you. Thing is, PDF/A forbids untagged (Device)CMYK and untagged (Device)RGB in the same PDF. The color package contains definitions in both colorspaces - While you can force one space, the conversation is lossy and subpar, which the package notes. So, if you happen to mention two colors of different spaces -- which the color names do not easlily give away --, it's not PDF/A anymore.

The same goes for any other graphic that might come your way.

Turns out I got a graphics file with spot colors, which alone took me a day to find out that

  1. PDF/A forbids them iirc
  2. they are not soo easy to remove if you want to keep the sanity of your image files.

Also a lot of generated imagery is definitely not easy to handle, think plots from R, Python matplotlib, and so on.

The average paper writer will not care about that nor should they.

PS: Ah, and I needed to patch colorspaces in my case…

Although there is current work being done at https://github.com/latex3/pdfresources https://github.com/latex3/pdfresources, I don't think this will be ready within the next 1.5 years or so. > This is more about Tagged PDF and PDF/UA, which is indeed a much harder thing to achieve. To me, acmart seems not the right place to tackle this problem at this point in time… > Really? If NSF is requiring it, then there are many authors who will need to learn how to use techniques, such as pdfx provides. Doing things differently for articles directed at different publishing agencies is just not sustainable into the future. The very last sentence in this recent article by a former ACM president sums it up. https://dl.acm.org/doi/pdf/10.1145/3483539 https://dl.acm.org/doi/pdf/10.1145/3483539 “...one would expect ACM to look to cutting-edge ideas for the benefit of its members.”

krono avatar Sep 22 '21 22:09 krono

Hi Tobias, and others

From: Tobias Pape @.> Date: 23 September 2021 at 8:43:25 am AEST To: borisveytsman/acmart @.> Cc: Ross Moore @.>, Comment @.> Subject: Re: [borisveytsman/acmart] [enhancement] Add color rendering annotations for PDF/A conformance (#435) Reply-To: borisveytsman/acmart @.***>



Hi Tobias, Boris and others.

Is there a misunderstanding here? Were you trying to generate PDF/UA or PDF/A-1a rather than one of: PDF/A-1b, -2b, -2u, -3b, -3u .

PDF/A-2u.

To include images, right?

This can be done almost instantaneously using \usepackage[]{pdfx} with just a very small amount of extra work in a couple of areas:

pdfx turned out to be completely incompatible for me, as I had to use hyperxmp, and that did not like each other.

What is the use case that requires hyperxmp ?

Also, an external xml file is a no-go for me.

The XMP packet can be constructed on-disk automatically from the LaTeX source; that’s the way I generally use it.

An externally-generated XML file is not required at all, but it can be incorporated if that is what you want to do. (I’d expect that ACM would do it this way in final production, since they can then add Metadata that isn’t available to the author while preparing their manuscript.)

For this to have been an issue for you, I’d have to wonder how old was the version of the pdfx package you were using?

Since I took over its maintenance, back around 2014, a more flexible approach to the handling of the XMP was one of the first things that I did.

meaningful MetaData (which is highly desirable anyway) ensuring all logos, images, coloured text all using the same Color Model. It is #2 https://github.com/borisveytsman/acmart/pull/2 that the original posting is about. Read more about this in the 1st section of an article I wrote 3 years ago:https://www.tug.org/TUGboat/tb39-2/tb122moore-pdf.pdf https://www.tug.org/TUGboat/tb39-2/tb122moore-pdf.pdf Any good graphics editing software allows you to make all your images use the same color model; e.g. just re-save as RGB rather than CMYK. then pdfx.sty will just do the right thing. I’ve been working this way for years.

Glad it worked out for you. Thing is, PDF/A forbids untagged (Device)CMYK and untagged (Device)RGB in the same PDF.

Yes. PDF/A expects just a single uniform ColorSpace. At least that’s how it was with PDF/A-1, -2, -3. (It may be different in PDF/A-4, which I’ve not yet had the time to study and support.)

So it is the format, not how it is supported in LaTeX, that is the issue?

My article, as mentioned previously, effectively makes this point. As said there, you would need to do color conversions in at least some of the images first.

That is, decide which you want RGB or CMYK, or try separate versions with each.

The color package contains definitions in both colorspaces - While you can force one space, the conversation is lossy and subpar, which the package notes. So, if you happen to mention two colors of different spaces -- which the color names do not easlily give away --, it's not PDF/A anymore.

Did you ever try the xcolor package? This is what pdfx.sty uses, loading it early once it knows which Color Space you want. Then all LaTeX color commands are converted automatically into that space, whichever way you have chosen to define the color.

The same goes for any other graphic that might come your way.

Turns out I got a graphics file with spot colors, which alone took me a day to find out that

PDF/A forbids them iirc they are not soo easy to remove if you want to keep the sanity of your image files. This would require graphics conversion software to be used, if you want it to be part of the PDF pages – but see the next paragraph. I’m not an expert on advanced usage of conversion software.

But if it’s really important to convey a high-quality graphic, this can be done by including it as an attachment, or as an Associated File. The main PDF visual area can show an RGB (or CMYK) preview, and provide a link to the attached file.

This is perfectly legal PDF/A-2, whatever color spaces are used in the attachments.

Also a lot of generated imagery is definitely not easy to handle, think plots from R, Python matplotlib, and so on.

I handle some of these almost daily. My strategy is to import them into Acrobat (which handles most graphics formats) and supplies a PDF wrapper. Export as PDF/A-2b with the required RGB or CMYK. (Or as PDF/A-2u if there are potential font issues within the image.)

Then use this wrapped version with LaTeX to build the full PDF document.

The average paper writer will not care about that nor should they.

Yes, but a publisher will; and authors need to interact with editors and/or publishers.

That’s exactly why my paper (referred to above) suggests that publishers provide an image-conversion service so that :

  1. authors can see in advance what would happen to their images if they do nothing;
  2. allow the author to work with the converted version, just as will happen in the final PDF. Furthermore, if the author has done any required conversions already, there’s no need for the publisher to repeat this work, thus keeping production expenses down a tad.

Image-handling has always been an issue, in particular with book-printing. Authors had to treat images separately. It’s not that long ago that it was customary to include full-page hi-res versions of images at the end of a manuscript.

Printing technologies have advanced so that this isn’t needed as much any more; but the cost is that usually you are actually getting a lower-res version in the body of the PDF.

PS: Ah, and I needed to patch colorspaces in my case…

Please send me this kind of patching.

The colorspaces package was spawned off pdfx.sty , specifically to allow this aspect to be treated independently.

I’m very curious to see what you had to do and why.

Would you please send me a link to get an example document that you are willing to share? I’m always on the lookout for ways to improve how LaTeX can be leveraged to handle specific aspects of PDF generation.

Although there is current work being done at https://github.com/latex3/pdfresources https://github.com/latex3/pdfresourceshttps://github.com/latex3/pdfresources https://github.com/latex3/pdfresources, I don't think this will be ready within the next 1.5 years or so. > This is more about Tagged PDF and PDF/UA, which is indeed a much harder thing to achieve. To me, acmart seems not the right place to tackle this problem at this point in time… > Really? If NSF is requiring it, then there are many authors who will need to learn how to use techniques, such as pdfx provides. Doing things differently for articles directed at different publishing agencies is just not sustainable into the future. The very last sentence in this recent article by a former ACM president sums it up. https://dl.acm.org/doi/pdf/10.1145/3483539 https://dl.acm.org/doi/pdf/10.1145/3483539 https://dl.acm.org/doi/pdf/10.1145/3483539 https://dl.acm.org/doi/pdf/10.1145/3483539 “...one would expect ACM to look to cutting-edge ideas for the benefit of its members.”

All the best.

Ross

Dr Ross Moore Department of Mathematics and Statistics 12 Wally’s Walk, Level 7, Room 734 Macquarie University, NSW 2109, Australia T: +61 2 9850 8955 | F: +61 2 9850 8114 M:+61 407 288 255 | E: @.*** http://www.maths.mq.edu.au

CRICOS Provider Number 00002J. Think before you print.  Please consider the environment before printing this email.

This message is intended for the addressee named and may  contain confidential information. If you are not the intended  recipient, please delete it and notify the sender. Views expressed  in this message are those of the individual sender, and are not  necessarily the views of Macquarie University. http://mq.edu.au/ CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email.

This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University. http://mq.edu.au/

ozross avatar Sep 23 '21 00:09 ozross

[I think we went a tad astray from the issue at hand :) I reply directly via email]

krono avatar Sep 23 '21 05:09 krono