lite-xl icon indicating copy to clipboard operation
lite-xl copied to clipboard

Don't calculate column offset per-char

Open Guldoman opened this issue 4 years ago • 7 comments

We were calling font:get_width_subpixel on each character. Now the entire token gets calculated at once. This should significantly improve performance in the presence of long lines.

Note that I also removed the call to common.utf8_chars to obtain characters. Should I reintroduce its functionality? If so, what should we do if we are given a column that is not at the end of a character? Should we consider to the end of the character or should we go back to the previous one? I guess the previous implementation stopped short, and also removed any invalid characters. Is this necessary?

~This also reverts #481.~

Guldoman avatar Sep 03 '21 19:09 Guldoman

Now this doesn't revert #481 anymore.

Guldoman avatar Sep 08 '21 02:09 Guldoman

Looking more closely that looks wrong to me because string.sub take a substring but using bytes instead of utf-8 characters. I think it was yourself that expressed the concern.

I think you should improve your modification to be sound for UTF-8 strings. I suggest to add a new function beside font:get_width_subpixel called font:get_width_subpixel_sub with the signature:

function Font:get_width_subpixel_sub(text, index_start, index_end)

where the index arguments defines the index of the substring we want to consider to compute the width. The important detail is that the index arguments should address UTF-8 character and not simple bytes. The function is meant to be implemented in C.

I think this go on the direction of being UTF-8 accurate and also improve performance by moving to the C side and avoiding creating a substring in the Lua side.

This could be also the opportunity to move on the C side a few UTF-8 critical functions.

franko avatar Sep 08 '21 09:09 franko

The important detail is that the index arguments should address UTF-8 character and not simple bytes.

The problem with this is that DocView:get_col_x_offset is given the col as byte index. This is because Doc exposes selections addressing bytes, not UTF-8 characters. I think that if we want to address UTF-8, we need to only expose UTF-8 indexes from Doc.

Guldoman avatar Sep 09 '21 14:09 Guldoman

The problem with this is that DocView:get_col_x_offset is given the col as byte index. This is because Doc exposes selections addressing bytes, not UTF-8 characters. I think that if we want to address UTF-8, we need to only expose UTF-8 indexes from Doc.

Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.

franko avatar Sep 09 '21 15:09 franko

Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.

In Font:get_width_subpixel_sub, what should we do if the provided index is not at the start/end of a UTF-8 character? If the start offset doesn't point to the start of a character, should I skip until the next? If the end offset is wrong should I stop short? Or should I just expect the index to be correct?

Also, what should we do about the col reported by statusview? It shows the byte offset.

Guldoman avatar Sep 09 '21 17:09 Guldoman

Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.

In Font:get_width_subpixel_sub, what should we do if the provided index is not at the start/end of a UTF-8 character? If the start offset doesn't point to the start of a character, should I skip until the next? If the end offset is wrong should I stop short? Or should I just expect the index to be correct?

Also, what should we do about the col reported by statusview? It shows the byte offset.

I think you may give yourself a response to these questions, if you want. Otherwise if my proposition isn't good, which is possible, we may discard the idea.

franko avatar Sep 09 '21 19:09 franko

Is this no longer valid? so it can be closed?

jgmdev avatar May 31 '22 15:05 jgmdev