uharfbuzz icon indicating copy to clipboard operation
uharfbuzz copied to clipboard

make use of info.cluster

Open replabrobin opened this issue 1 year ago • 3 comments

I wanted to set the buffer info cluster value before shaping so I could use the returned cluster numbers as a guide to the input colours etc etc. I had to add a setter to make this possible

diff --git a/src/uharfbuzz/_harfbuzz.pyx b/src/uharfbuzz/_harfbuzz.pyx
index 5adf637..ead947e 100644
--- a/src/uharfbuzz/_harfbuzz.pyx
+++ b/src/uharfbuzz/_harfbuzz.pyx
@@ -69,6 +69,10 @@ cdef class GlyphInfo:
     def cluster(self) -> int:
         return self._hb_glyph_info.cluster
 
+    @cluster.setter
+    def cluster(self,v) -> None:
+        self._hb_glyph_info.cluster = v
+
     @property
     def flags(self) -> GlyphFlags:
         return GlyphFlags(self._hb_glyph_info.mask & HB_GLYPH_FLAG_DEFINED)

but although I can set the cluster values prior to shaping the returned clusters are all zero

so this code

#!/bin/env python
import uharfbuzz as hb

if False:
	import sys
	fontfile = sys.argv[1]
	text = sys.argv[2]
else:
	fontfile = '/home/robin/devel/reportlab/REPOS/reportlab/tmp/NotoSansKhmer/NotoSansKhmer-Regular.ttf'
	#1786 Khmer Letter Cha
	#17D2 Khmer Sign Coeng
	#1793 Khmer Letter No
	#17B6 Khmer Vowel Sign Aa
	#17C6 Khmer Sign Nikahit
	text = '\u1786\u17D2\u1793\u17B6\u17C6'

blob = hb.Blob.from_file_path(fontfile)
face = hb.Face(blob)
font = hb.Font(face)

buf = hb.Buffer()
buf.add_str(text)
infos = buf.glyph_infos
print(f'initial {len(infos)=}')
for i,info in enumerate(infos):
	info.cluster=i
buf.guess_segment_properties()
infos = buf.glyph_infos
print(f'guessed {len(infos)=} {[info.cluster for info in infos]}')

features = {"kern": True, "liga": True}
hb.shape(font, buf, features)

infos = buf.glyph_infos
positions = buf.glyph_positions

for info, pos in zip(infos, positions):
	gid = info.codepoint
	glyph_name = font.glyph_to_string(gid)
	cluster = info.cluster
	x_advance = pos.x_advance
	x_offset = pos.x_offset
	y_offset = pos.y_offset
	print(f"{glyph_name} gid{gid}={cluster}@{x_advance},{y_offset}+{x_advance}")

produces this output

$ tmp/tuharfbuzz 
initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=0@0,-29+0

and all the returned clusters seem to be zero.

I find that if I use buf.cluster_level = 1 after creation then I do see a difference of clusters ie gid137 gets a cluster value 4

initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=4@0,-29+0

replabrobin avatar Jun 10 '24 11:06 replabrobin

I don't think you are ever supposed to set the cluster manually. HarfBuzz does that for you, but there are three "levels" of operation, giving different results:

  • hb.BufferClusterLevel.DEFAULT aka hb.BufferClusterLevel.MONOTONE_GRAPHEMES
  • hb.BufferClusterLevel.MONOTONE_CHARACTERS
  • hb.BufferClusterLevel.CHARACTERS

https://harfbuzz.github.io/working-with-harfbuzz-clusters.html

In the context of your example, you would set the level like this:

buf.cluster_level = hb.BufferClusterLevel.CHARACTERS

justvanrossum avatar Jun 10 '24 13:06 justvanrossum

Thanks for that info. I don't need cluster.setter then. I really don't want to get into the horrid details of harfbuzz. The layout problems that result from using a shaper are enough. I suppose reportlab will need a new kind of font to allow input shaping and after line breaking the line drawing will need additional positioning. I doubt that we will end up with just one way to do it :(

replabrobin avatar Jun 11 '24 07:06 replabrobin

Setting clusters on the buffer is sometimes useful. For example, in hb-view we reset them to be Unicode character index, instead of UTF-8 index.

behdad avatar Jun 11 '24 20:06 behdad