fonttools Invalid character in name field

ttx crashes on importing invalid character from xml.

One font has this

    <namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
      Converter: Windows Type 1 Installer V1.0d.&#65535;Font: V1.0
    </namerecord>

when passed through ttx. When this is sent back to ttx, it crashes on an Expat error "xml.parsers.expat.ExpatError: reference to invalid character number"

Jun 12 '14 14:06 olivierberten

Ugh. Not sure there's much we can do here. Unfortunately some systems read some parts of the Unicode standard too strongly, ie. in this case, disallowing U-FFFF although there's no practical reason to do that.

Jun 12 '14 22:06 behdad

At least fontTools could catch the error instead of crashing... or change to another xml parsing lib ;-)

Jun 13 '14 07:06 olivierberten

Perhaps we should catch this before writing the XML out, and write a binary encoded version. I have a patchset for that. Will dig it out.

Jul 21 '14 17:07 behdad

Ugh. Python's own Unicode routines don't have any problem with decoding to U-FFFF:

$ python3 Python 3.2.3 (default, Feb 27 2014, 21:31:18) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

b'\xef\xbf\xbf'.decode('utf-8') '\uffff'

We really have to go out of our way to fix this. I'm going to add an option to dump name entries "raw", which should make them never fail roundtripping.

Jul 21 '14 17:07 behdad

We really have to go out of our way to fix this. I'm going to add an option to dump name entries "raw", which should make them never fail roundtripping.

@behdad I'm currently experiencing the same problem. I can't seem to find the option to dump the name entries "raw", did this option ever make it?

Feb 22 '23 03:02 junjie

did this option ever make it?

It doesn't look like it unfortunately.

Feb 22 '23 16:02 behdad

#3868 allows namerecords containing U+FFFF to pass, although it may not be considered a fix of this issue.

Jul 16 '25 08:07 knutnergaard