pdf-lib icon indicating copy to clipboard operation
pdf-lib copied to clipboard

Convert in PDF/A

Open releandro15 opened this issue 2 years ago • 5 comments

Describe your idea

Option for convert pdf in pdf/a

How could this be implemented?

A function

What problem are you trying to solve?

Converte pdf in pdf/a

Why does this matter to you?

.

Would others find this helpful?

Sim

Are you interested in implementing your proposal?

Yes

Why are you submitting a proposal?

Because you don't have it yet

Additional Notes

No response

releandro15 avatar Mar 03 '22 18:03 releandro15

PDF/A is already possible with pdf-lib. See #230 for reference. There is no built-in function, but it can easily be achieved following the comments.

Simolation avatar Mar 07 '22 16:03 Simolation

The solution presented is very difficult. Is there a possibility of documentation that was straight to the point of how to do it?

releandro15 avatar Mar 08 '22 18:03 releandro15

?

releandro15 avatar Mar 28 '22 13:03 releandro15

For PDF/A Standard

  • Set Document ID
const documentId = crypto.randomBytes(16).toString('hex')
const id = PDFHexString.of(documentId)
pdfDoc.context.trailerInfo.ID = pdfDoc.context.obj([id, id]);
  • Add embedded font (check here)
  • Set Print Profile (check this PR #1512 )
  • Set Trim Box to page page.setTrimBox(0, 0, width, height)
  • Use correct color from print profile. If print profile is rgb then use rgb only in the whole document.
  • Add correct metadata (see example below)
  • Metadata must be the same data with document information
// set document information
const createDate = new Date();
pdfDoc.setTitle(title);
pdfDoc.setAuthor(author);
pdfDoc.setProducer(producer);
pdfDoc.setCreator(creator);
pdfDoc.setCreationDate(createDate);
pdfDoc.setModificationDate(createDate);
_addMetadataToDoc(createDate);
_addMetadata(pdfDoc, date, documentId, title, author, producer, creator) {
    const metadataXML = `
    <?xpacket begin="" id="${documentId}"?>
      <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26        ">
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

          <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:format>application/pdf</dc:format>
            <dc:creator>
              <rdf:Seq>
                <rdf:li>${author}</rdf:li>
              </rdf:Seq>
            </dc:creator>
            <dc:title>
               <rdf:Alt>
                  <rdf:li xml:lang="x-default">${title}</rdf:li>
               </rdf:Alt>
            </dc:title>
          </rdf:Description>

          <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
            <xmp:CreatorTool>${creator}</xmp:CreatorTool>
            <xmp:CreateDate>${_formatDate(date)}</xmp:CreateDate>
            <xmp:ModifyDate>${_formatDate(date)}</xmp:ModifyDate>
            <xmp:MetadataDate>${_formatDate(date)}</xmp:MetadataDate>
          </rdf:Description>

          <rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
            <pdf:Producer>${producer}</pdf:Producer>
          </rdf:Description>

          <rdf:Description rdf:about="" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
            <pdfaid:part>1</pdfaid:part>
            <pdfaid:conformance>B</pdfaid:conformance>
          </rdf:Description>
        </rdf:RDF>
      </x:xmpmeta>
    <?xpacket end="w"?>
    `.trim();

    const metadataStream = pdfDoc.context.stream(metadataXML, {
      Type: 'Metadata',
      Subtype: 'XML',
      Length: metadataXML.length,
    });
    const metadataStreamRef = pdfDoc.context.register(metadataStream);
    pdfDoc.catalog.set(PDFName.of('Metadata'), metadataStreamRef);
  }
  
  // remove millisecond from date
  _formatDate(date) {
    return date.toISOString().split('.')[0] + 'Z';
  }

Don't forget to to change PDF/A version number here

This is example for PDF/A-1B

 <pdfaid:part>1</pdfaid:part>
 <pdfaid:conformance>B</pdfaid:conformance>

For PDF/A-1B, it doesn't allow compression. So when you save document, disable useObjectStreams

pdfDoc.save({
   useObjectStreams: false,
})

If you would like to get metadata from existing document,

const metadata = pdfDoc.catalog.lookup(PDFName.of('Metadata'));
const textDecoder = new TextDecoder();
const text = textDecoder.decode(metadata.contents);
console.log(text);

necessarylion avatar Aug 19 '23 18:08 necessarylion