icu4x
icu4x copied to clipboard
Additional FFI APIs in Bidi
Followups from https://github.com/unicode-org/icu4x/issues/1735:
- Level APIs should be at the top level on ICU4XBidiInfo
- Do we need the ability to pass in a default level to ICU4XBidiInfo?
- We probably want whole-text APIs that don't operate on paragraphs. What shoudl they look like?
Happy to write these, but would like input on @sffc on Skia's needs for the second two bullet points
cc @younies
happy to mentor the addition of such APIs too, once the classes are defined it's quite easy to add and remove diplomat methods
Skia's code is here: https://skia.googlesource.com/skia/+/refs/heads/main/modules/skunicode/src/SkUnicode_icu.cpp
Key call sites:
- [x]
sk_ubidi_setPara(bidi.get(), (const UChar*)utf16, utf16Units, bidiLevel, nullptr, &status);
- [ ]
sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
- [x]
sk_ubidi_getLevelAt(fBidi.get(), pos);
- [x]
sk_ubidi_getLength(bidi.get());
- [x]
sk_ubidi_getDirection(bidi.get())
Note that these functions are all called on the top-level UBidi type, which is equivalent to BidiInfo in Rust.
sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
Looking at this closer and I'm not convinced that unicode-bidi has this functionality already. Specifically, unicode-bidi has the ability to reorder a line for a single paragraph and return a string, and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps. I figure the crate has the code for this, just not the APIs.
@younies or @sffc, can you please check? I'm happy to make FFI wrappers but I'd rather not get into the weeds of figuring out implementation stuff. it's also possible that Skia's needs can be satisfied with the existing unicode-bidi APIs just used differently.
Got a PR for most of these in https://github.com/unicode-org/icu4x/pull/2026 (i've ticked them off in the list). For one of them I need to make some minor upstream fixes, and for the other I'd like one of y'all to see what changes need ot be made.
@Manishearth , by looking to this doc: https://bit.ly/icu4x-bidi , and looking for the section: "Detailed design for BiDix Using Unicode BiDi crate" (API subsection), we can see the listed APIs needed by Skia
Therefore, my answer for the questions as following:
Do we need the ability to pass in a default level to ICU4XBidiInfo?
Yes, as we can see in their use sk_ubidi_setPara(bidi.get(),(const UChar*)utf16.c_str(), utf16.size(), bidiLevel, nullptr, &status)
, they are passing the bidiLevel.
We probably want whole-text APIs that don't operate on paragraphs. What should they look like? You could not operate on the whole text, you need to split the text to paragraphs then apply the BIDI algorithms. sometimes, the text is just one paragraph.
NOTE: UBIDI in icu4c refers to one paragraph and not the whole text.
For sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
while our discussion, we decided for now to use the reorder
function on a paragraph instead of using reorder
on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.
@Manishearth , we can have a quick meeting to discuss those cases.
In that case I think the main thing remaining is this part:
and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps.
Would you be able to add these APIs to unicode-bidi?
Do we need the ability to pass in a default level to ICU4XBidiInfo?
Also, this part is already done.
In that case I think the main thing remaining is this part:
and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps.
Would you be able to add these APIs to unicode-bidi?
Okay, I will look to it tomorrow, I think it should be easy
For
sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
while our discussion, we decided for now to use thereorder
function on a paragraph instead of usingreorder
on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.
I don't understand this. Skia uses the ubidi_reorderVisual
function. What are you proposing?
- Add the function to unicode-bidi and icu4x
- Change Skia so that it doesn't need to use that function any more
For
sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
while our discussion, we decided for now to use thereorder
function on a paragraph instead of usingreorder
on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.I don't understand this. Skia uses the
ubidi_reorderVisual
function. What are you proposing?
- Add the function to unicode-bidi and icu4x
- Change Skia so that it doesn't need to use that function any more
No.2, change Skia. This function just took the levels and the level count which already in the ParagraphInfo
, why do they need to extract those info from the ParagraphInfo
and feed it to the reorder
function instead of using it directly from ParagraphInfo
@Manishearth , the function that returns the level for a certain character position is here:
pub fn level_at(&self, pos: usize) -> Level {
let actual_position = self.para.range.start + pos;
self.info.levels[actual_position]
}
The input is the position of the character not the character itself.
The input is the position of the character not the character itself.
Yes, that's how it works in ICU, no?
The input is the position of the character not the character itself.
Yes, that's how it works in ICU, no?
Yes, you can see the Skia usage in the document
sk_ubidi_getLevelAt(bidi.get(), position)
Right, that's what we do in the code too
@sffc and @Manishearth , after trying to fix the skia code (by following this link: https://skia.org/docs/dev/contrib/submit/), I found two things:
- For this function
ubidi_reorderVisual
, Skia uses it to have a big array that maps fromvisual order
-->logical order
for all the text. This means, that if they want to use ourreorder
function, they need to keep another array for the reordered text. (I really do not understand why the users need to have a mapper array instead of just a copy of the text in the correct order, but this we can discuss it with them or I can go ahead and implementubidi_reorderVisual
function (if servo approved implementing such a function)). - How to import the C++ headers from ICU4X?
Yeah, that's what I was referring to as "index maps" earlier. I think we should be fine to add them, but it should be a decent API. We might want to make a separate IndexMap type that wraps a vector or something.
As for the C++ headers, they're all in ffi/diplomat/cpp/include.
For (2), please change Skia to continue using ICU4C APIs, just a smaller set that are compatible with ICU4X. Actually changing Skia to ICU4X is a bigger change.
Thought: since the ubidi_reorderVisual
function is stateless (requires no data), and it is only about a dozen lines long, we could just implement the function in Skia in the ICU4X wrapper and not worry about crossing over FFI for it.
@sffc we still need some kind of intermediate API exposed from unicode_bidi for this, yes? any idea what it should look like?
@sffc we still need some kind of intermediate API exposed from unicode_bidi for this, yes? any idea what it should look like?
What I mean is that this particular function is not too terribly large, and what it does is not really a deep part of the bidi algorithm, so we can basically copy the code out of ICU (or cleanroom it if necessary) into Skia C++
ah, i see. wouldn't that involve querying the bidi data over ffi in a loop, though? might have perf issues.
The function doesn't cross FFI; it operates only on the two arrays that were passed in (one mutable, one immutable)
https://github.com/unicode-org/icu/blob/e69f337f3cf3cfa8504513cdfdd72dd67e071a0e/icu4c/source/common/ubidiln.cpp#L813
Right, but we still need lower level APIs that produce those level maps, yes?
Though it's possible that's already exposed by unicode-bidi by virtue of most fields being public
It's none of my business how Skia gets the level maps; they must have code somewhere that generates them. Maybe they come from Harfbuzz or elsewhere.
Ah, I imagined that they got it from ubidi apis
@Manishearth , After merging this PR: https://github.com/servo/unicode-bidi/pull/80, we can close this issue.
No, this issue is not done until we also expose reorder_visual
in the FFI. That is literally the title of the issue :)
Ah, I was under the impression Younies had done that too (from his comment) and I just hadn't seen the PR. Should have looked closer at the PR link!