pdf-lib icon indicating copy to clipboard operation
pdf-lib copied to clipboard

PDFString supports only one-byte characters

Open m-kemarskyi opened this issue 1 year ago • 2 comments

What were you trying to do?

I was trying to add a comment to PDF with cyrillic letters.

How did you attempt to do it?

const commentAnnotRef = this.pdfDocument.context.register(
  this.pdfDocument.context.obj({
    Type: 'Annot',
    Subtype: 'Text',
    Open: true,
    Name: 'Comment', // Determines the icon to place in the document
    T: PDFString.of('abc абві äüöß'), // Comment title
    Contents: PDFString.of('abc абві äüöß'), // Comment main text
    // The position of the annotation
    Rect: [
      xCoordinate,
      pageHeight - yCoordinate,
      xCoordinate,
      pageHeight - yCoordinate,
    ],
  })
)

What actually happened?

It turned out that one-byte per characters is used under the hood (see the result on the screenshot) Screenshot 2024-07-04 at 13 45 17

What did you expect to happen?

I expected UTF-8 characters to work correctly.

How can we reproduce the issue?

Try to add the comment to PDF file using the code I've provided

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

  • [X] My report includes a Short, Self Contained, Correct (Compilable) Example.
  • [X] I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

No response

m-kemarskyi avatar Jul 04 '24 11:07 m-kemarskyi

I've tried to come up with the custom PDFUnicodeString class but it didn't worked out:

export class PDFUnicodeString extends PDFObject {
  // The PDF spec allows newlines and parens to appear directly within a literal
  // string. These character _may_ be escaped. But they do not _have_ to be. So
  // for simplicity, we will not bother escaping them.
  static of = (value: string) => new PDFUnicodeString(value);

  private readonly value: string;

  private constructor(value: string) {
    super();
    this.value = value;
  }

  asBytes(): Uint8Array {
    return new TextEncoder().encode(this.value)
  }

  asString(): string {
    return this.value;
  }

  clone(): PDFUnicodeString {
    return PDFUnicodeString.of(this.value);
  }

  toString(): string {
    return `(${this.value})`;
  }

  sizeInBytes(): number {
    return new TextEncoder().encode(this.value).length + 2;
  }

  copyBytesInto(buffer: Uint8Array, offset: number): number {
    buffer[offset++] = 40;
    const encodedValue = new TextEncoder().encode(this.value);
    buffer.set(encodedValue, offset);
    offset += encodedValue.length;
    buffer[offset++] = 41;
    
    return encodedValue.length + 2;
  }
}

m-kemarskyi avatar Jul 04 '24 13:07 m-kemarskyi

UPD: PDFHexString class solves the problem: PDFHexString.fromText(YOUR_TEXT)

m-kemarskyi avatar Jul 10 '24 07:07 m-kemarskyi