icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Additional FFI APIs in Bidi

Open Manishearth opened this issue 2 years ago • 30 comments

Followups from https://github.com/unicode-org/icu4x/issues/1735:

  • Level APIs should be at the top level on ICU4XBidiInfo
  • Do we need the ability to pass in a default level to ICU4XBidiInfo?
  • We probably want whole-text APIs that don't operate on paragraphs. What shoudl they look like?

Happy to write these, but would like input on @sffc on Skia's needs for the second two bullet points

Manishearth avatar May 27 '22 22:05 Manishearth

cc @younies

happy to mentor the addition of such APIs too, once the classes are defined it's quite easy to add and remove diplomat methods

Manishearth avatar May 27 '22 22:05 Manishearth

Skia's code is here: https://skia.googlesource.com/skia/+/refs/heads/main/modules/skunicode/src/SkUnicode_icu.cpp

Key call sites:

  • [x] sk_ubidi_setPara(bidi.get(), (const UChar*)utf16, utf16Units, bidiLevel, nullptr, &status);
  • [ ] sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);
  • [x] sk_ubidi_getLevelAt(fBidi.get(), pos);
  • [x] sk_ubidi_getLength(bidi.get());
  • [x] sk_ubidi_getDirection(bidi.get())

Note that these functions are all called on the top-level UBidi type, which is equivalent to BidiInfo in Rust.

sffc avatar May 28 '22 00:05 sffc

  • sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual);

Looking at this closer and I'm not convinced that unicode-bidi has this functionality already. Specifically, unicode-bidi has the ability to reorder a line for a single paragraph and return a string, and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps. I figure the crate has the code for this, just not the APIs.

@younies or @sffc, can you please check? I'm happy to make FFI wrappers but I'd rather not get into the weeds of figuring out implementation stuff. it's also possible that Skia's needs can be satisfied with the existing unicode-bidi APIs just used differently.

Manishearth avatar Jun 06 '22 23:06 Manishearth

Got a PR for most of these in https://github.com/unicode-org/icu4x/pull/2026 (i've ticked them off in the list). For one of them I need to make some minor upstream fixes, and for the other I'd like one of y'all to see what changes need ot be made.

Manishearth avatar Jun 08 '22 18:06 Manishearth

@Manishearth , by looking to this doc: https://bit.ly/icu4x-bidi , and looking for the section: "Detailed design for BiDix Using Unicode BiDi crate" (API subsection), we can see the listed APIs needed by Skia

Therefore, my answer for the questions as following: Do we need the ability to pass in a default level to ICU4XBidiInfo? Yes, as we can see in their use sk_ubidi_setPara(bidi.get(),(const UChar*)utf16.c_str(), utf16.size(), bidiLevel, nullptr, &status), they are passing the bidiLevel.

We probably want whole-text APIs that don't operate on paragraphs. What should they look like? You could not operate on the whole text, you need to split the text to paragraphs then apply the BIDI algorithms. sometimes, the text is just one paragraph.

NOTE: UBIDI in icu4c refers to one paragraph and not the whole text.

For sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual); while our discussion, we decided for now to use the reorder function on a paragraph instead of using reorder on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.

younies avatar Jun 27 '22 16:06 younies

@Manishearth , we can have a quick meeting to discuss those cases.

younies avatar Jun 27 '22 16:06 younies

In that case I think the main thing remaining is this part:

and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps.

Would you be able to add these APIs to unicode-bidi?

Manishearth avatar Jun 27 '22 16:06 Manishearth

Do we need the ability to pass in a default level to ICU4XBidiInfo?

Also, this part is already done.

Manishearth avatar Jun 27 '22 17:06 Manishearth

In that case I think the main thing remaining is this part:

and it has the ability to return a level array per character (or byte), but I don't see anything about constructing reordering index maps.

Would you be able to add these APIs to unicode-bidi?

Okay, I will look to it tomorrow, I think it should be easy

younies avatar Jun 27 '22 22:06 younies

For sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual); while our discussion, we decided for now to use the reorder function on a paragraph instead of using reorder on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.

I don't understand this. Skia uses the ubidi_reorderVisual function. What are you proposing?

  1. Add the function to unicode-bidi and icu4x
  2. Change Skia so that it doesn't need to use that function any more

sffc avatar Jun 28 '22 00:06 sffc

For sk_ubidi_reorderVisual(runLevels, levelsCount, logicalFromVisual); while our discussion, we decided for now to use the reorder function on a paragraph instead of using reorder on just the levels and the count. Till now, I am not sure why Skia uses this API. we can sync offline though.

I don't understand this. Skia uses the ubidi_reorderVisual function. What are you proposing?

  1. Add the function to unicode-bidi and icu4x
  2. Change Skia so that it doesn't need to use that function any more

No.2, change Skia. This function just took the levels and the level count which already in the ParagraphInfo, why do they need to extract those info from the ParagraphInfo and feed it to the reorder function instead of using it directly from ParagraphInfo

younies avatar Jun 28 '22 12:06 younies

@Manishearth , the function that returns the level for a certain character position is here:

    pub fn level_at(&self, pos: usize) -> Level {
        let actual_position = self.para.range.start + pos;
        self.info.levels[actual_position]
    }

The input is the position of the character not the character itself.

younies avatar Jun 28 '22 13:06 younies

The input is the position of the character not the character itself.

Yes, that's how it works in ICU, no?

Manishearth avatar Jun 28 '22 15:06 Manishearth

The input is the position of the character not the character itself.

Yes, that's how it works in ICU, no?

Yes, you can see the Skia usage in the document

sk_ubidi_getLevelAt(bidi.get(), position)

younies avatar Jun 28 '22 16:06 younies

Right, that's what we do in the code too

Manishearth avatar Jun 28 '22 16:06 Manishearth

@sffc and @Manishearth , after trying to fix the skia code (by following this link: https://skia.org/docs/dev/contrib/submit/), I found two things:

  1. For this function ubidi_reorderVisual, Skia uses it to have a big array that maps from visual order --> logical order for all the text. This means, that if they want to use our reorder function, they need to keep another array for the reordered text. (I really do not understand why the users need to have a mapper array instead of just a copy of the text in the correct order, but this we can discuss it with them or I can go ahead and implement ubidi_reorderVisual function (if servo approved implementing such a function)).
  2. How to import the C++ headers from ICU4X?

younies avatar Jul 06 '22 11:07 younies

Yeah, that's what I was referring to as "index maps" earlier. I think we should be fine to add them, but it should be a decent API. We might want to make a separate IndexMap type that wraps a vector or something.

As for the C++ headers, they're all in ffi/diplomat/cpp/include.

Manishearth avatar Jul 06 '22 14:07 Manishearth

For (2), please change Skia to continue using ICU4C APIs, just a smaller set that are compatible with ICU4X. Actually changing Skia to ICU4X is a bigger change.

sffc avatar Jul 08 '22 01:07 sffc

Thought: since the ubidi_reorderVisual function is stateless (requires no data), and it is only about a dozen lines long, we could just implement the function in Skia in the ICU4X wrapper and not worry about crossing over FFI for it.

sffc avatar Jul 16 '22 01:07 sffc

@sffc we still need some kind of intermediate API exposed from unicode_bidi for this, yes? any idea what it should look like?

Manishearth avatar Jul 18 '22 18:07 Manishearth

@sffc we still need some kind of intermediate API exposed from unicode_bidi for this, yes? any idea what it should look like?

What I mean is that this particular function is not too terribly large, and what it does is not really a deep part of the bidi algorithm, so we can basically copy the code out of ICU (or cleanroom it if necessary) into Skia C++

sffc avatar Jul 18 '22 20:07 sffc

ah, i see. wouldn't that involve querying the bidi data over ffi in a loop, though? might have perf issues.

Manishearth avatar Jul 18 '22 20:07 Manishearth

The function doesn't cross FFI; it operates only on the two arrays that were passed in (one mutable, one immutable)

https://github.com/unicode-org/icu/blob/e69f337f3cf3cfa8504513cdfdd72dd67e071a0e/icu4c/source/common/ubidiln.cpp#L813

sffc avatar Jul 18 '22 21:07 sffc

Right, but we still need lower level APIs that produce those level maps, yes?

Manishearth avatar Jul 18 '22 22:07 Manishearth

Though it's possible that's already exposed by unicode-bidi by virtue of most fields being public

Manishearth avatar Jul 18 '22 22:07 Manishearth

It's none of my business how Skia gets the level maps; they must have code somewhere that generates them. Maybe they come from Harfbuzz or elsewhere.

sffc avatar Jul 19 '22 00:07 sffc

Ah, I imagined that they got it from ubidi apis

Manishearth avatar Jul 19 '22 17:07 Manishearth

@Manishearth , After merging this PR: https://github.com/servo/unicode-bidi/pull/80, we can close this issue.

younies avatar Jul 29 '22 17:07 younies

No, this issue is not done until we also expose reorder_visual in the FFI. That is literally the title of the issue :)

sffc avatar Jul 31 '22 05:07 sffc

Ah, I was under the impression Younies had done that too (from his comment) and I just hadn't seen the PR. Should have looked closer at the PR link!

Manishearth avatar Jul 31 '22 16:07 Manishearth