sfntly
sfntly copied to clipboard
missing code point
When I use sfntly to extract a subset of fonts, some unicode code points can be obtained correctly, but some are not. I am a little confused, please help to take a look.
public static void main(String[] args) throws Exception{
String codes = "\\u5e7e\\u8EAB\\ue85d\\ue85e\\u21deb\\u21df8\\u347e\\u347F";
File srcFontFile = new File("D:\\wanghonghui\\Desktop\\mytest.ttf");
File disFontFile = new File("D:\\wanghonghui\\Desktop\\test.ttf");
getSubFont(codes, srcFontFile, disFontFile);
}
public static void getSubFont(String ucodes, File srcFontFile, File disFontFile) throws Exception{
long start = System.currentTimeMillis();
Font font = FontUtils.getFonts(new FileInputStream(srcFontFile))[0];
Set<Integer> glyphs = new LinkedHashSet<Integer>();
CMapTable cMapTable = font.getTable(Tag.cmap);
CMap cmap = cMapTable.cmap(Font.PlatformId.Windows.value(), Font.WindowsEncodingId.UnicodeUCS4.value());
System.err.println(cmap);
int glyphId = 0;
for(String ucode : ucodes.split("\\\\u")) {
if(StringUtils.isEmpty(ucode)) continue;
glyphId = cmap.glyphId(Integer.parseInt(ucode, 16));
if(glyphId != 0) {
glyphs.add(glyphId);
} else {
System.err.println("code:"+ucode+",not found");
}
}
FontFactory fontFactory = FontFactory.getInstance();
Subsetter subsetter = new RenumberingSubsetter(font, fontFactory);
List<Integer> glyphList = new ArrayList<Integer>(glyphs);
subsetter.setGlyphs(glyphList);
Font newFont = subsetter.subset().build();
FileOutputStream fos = new FileOutputStream(disFontFile);
fontFactory.serializeFont(newFont, fos);
long used = System.currentTimeMillis()-start;
System.err.println("time: "+used+"ms");
} ```
Hello whh,
String codes = "\u5e7e\u8EAB\ue85d\ue85e\u21deb\u21df8\u347e\u347F";
Some of the \u
sequences look as if they contain 5 hex digits, for example \\u21df8
. Did you really intend to include the code points U+21DF "DOWNWARDS ARROW WITH DOUBLE STROKE" and U+0038 "DIGIT EIGHT"?
Thank you very much your reply. \u21df8
is a unicode, which actually corresponds to a Chinese character, please see https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=21df8&useutf8=true,there will be problems use 5 hex digits.
In Java, the character sequence \u21df8
is interpreted as U+21DF followed by U+0038. That's how it is, Java doesn't support \u
with more than 4 hexadecimal digits. See JLS 17 sections 3.1 to 3.3.
If you encode your desired code points in UTF-16, this may already solve your problem.
Contrary to Java, Unicode allows 5 or 6 digits when referring to a code point such as U+21DF8. Keep this difference between Unicode and Java in mind.