lite-xl
lite-xl copied to clipboard
Don't calculate column offset per-char
We were calling font:get_width_subpixel on each character. Now the entire token gets calculated at once.
This should significantly improve performance in the presence of long lines.
Note that I also removed the call to common.utf8_chars to obtain characters.
Should I reintroduce its functionality? If so, what should we do if we are given a column that is not at the end of a character? Should we consider to the end of the character or should we go back to the previous one?
I guess the previous implementation stopped short, and also removed any invalid characters. Is this necessary?
~This also reverts #481.~
Now this doesn't revert #481 anymore.
Looking more closely that looks wrong to me because string.sub take a substring but using bytes instead of utf-8 characters. I think it was yourself that expressed the concern.
I think you should improve your modification to be sound for UTF-8 strings. I suggest to add a new function beside font:get_width_subpixel called font:get_width_subpixel_sub with the signature:
function Font:get_width_subpixel_sub(text, index_start, index_end)
where the index arguments defines the index of the substring we want to consider to compute the width. The important detail is that the index arguments should address UTF-8 character and not simple bytes. The function is meant to be implemented in C.
I think this go on the direction of being UTF-8 accurate and also improve performance by moving to the C side and avoiding creating a substring in the Lua side.
This could be also the opportunity to move on the C side a few UTF-8 critical functions.
The important detail is that the index arguments should address UTF-8 character and not simple bytes.
The problem with this is that DocView:get_col_x_offset is given the col as byte index. This is because Doc exposes selections addressing bytes, not UTF-8 characters.
I think that if we want to address UTF-8, we need to only expose UTF-8 indexes from Doc.
The problem with this is that
DocView:get_col_x_offsetis given the col as byte index. This is becauseDocexposes selections addressing bytes, not UTF-8 characters. I think that if we want to address UTF-8, we need to only expose UTF-8 indexes fromDoc.
Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.
Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.
In Font:get_width_subpixel_sub, what should we do if the provided index is not at the start/end of a UTF-8 character?
If the start offset doesn't point to the start of a character, should I skip until the next?
If the end offset is wrong should I stop short?
Or should I just expect the index to be correct?
Also, what should we do about the col reported by statusview? It shows the byte offset.
Or we accept byte-based indexes by knowing that they are aligned on UTF-8 boundaries.
In
Font:get_width_subpixel_sub, what should we do if the provided index is not at the start/end of a UTF-8 character? If the start offset doesn't point to the start of a character, should I skip until the next? If the end offset is wrong should I stop short? Or should I just expect the index to be correct?Also, what should we do about the
colreported bystatusview? It shows the byte offset.
I think you may give yourself a response to these questions, if you want. Otherwise if my proposition isn't good, which is possible, we may discard the idea.
Is this no longer valid? so it can be closed?