pdfjs icon indicating copy to clipboard operation
pdfjs copied to clipboard

Font encoding remapping "/Differences"

Open gregorybrzeski opened this issue 6 years ago • 2 comments

PDF built-in fonts have nearly (?) full charset (at least it supports all characters which I need from ISO-8859-2). However only characters in WinAnsiEncoding are exposed so only those can be used.

PDF specification allows for character mapping, which allows to map a Character Code to a Font character name. Here is an example:

4 0 obj
<<
/Type /Encoding
/BaseEncoding /WinAnsiEncoding
/Differences [
  143 /Zacute
  159 /zacute
]
>>
endobj

This can be used on Type Font object to use as a definition of encoding:

/Type /Font
/Subtype /Type1
/Encoding 4 0 R
/BaseFont /Helvetica

This would work for Courier however for other fonts it needs a width of a character to be readjusted as well, so defaults for width from Helvetica.json need to be modified.

This mappings for encoding can be passed as an option on new Font() instance creation like:

const encodingMapping = {
  '\u017b': {
    charCode: 143,
    width: 611
  }
};
new Font( require('Helvetica.json'), { encoding: encodingMapping }) );

Then the logic can take place there to implement this encoding mapping.

I was planning to prepare a PR and while looking at the code I cannot figure how to easily obtain an id for newly created object. When doing doc._writeObject(....) it doesn't return an id, sometimes it will return just an empty resolved Promise (when object creation is queued).

Any ideas how to easily obtain an id for created object ? This id is needed in order to reference the new Type Encoding object in Type Font object.

gregorybrzeski avatar Jul 31 '17 17:07 gregorybrzeski

Cool, I wasn't aware that this is possible for AFM fonts (probably because I've only added AFM fonts for the ease of initial use and not for my own usage of the library). I am wondering if it is maybe be enough to simply extend the following mapping: https://github.com/rkusa/pdfjs/blob/master/lib/font/afm.js#L112-L140 ?

You can obtain an id using doc._registerObject(obj). This allows using the object as a reference (obj.toReference()) before actually writing it.

I am glad to see your interest in this library!

rkusa avatar Aug 01 '17 07:08 rkusa

If you extend the mentioned mapping you are breaking the default encoding and I believe module should be generic but flexible by giving users option to remap as required. I was planing to handle this in _charCodeFor() function.

Thank you, I will look at both functions mentioned by you.

gregorybrzeski avatar Aug 01 '17 08:08 gregorybrzeski