utf-8 support
Hi, I like your plugin a lot - it's exactly what I need at certain moments) But it doesn't work well when buffer contains UTF-8 symbols with 2 or more chars (emojis, cyrillics etc.). I made some changes to the source code (see diff below). This code changes are too small and raw to make a PR. It works as expected (not ideal, see the section after diff):
diff --git a/lua/cellular-automaton/load.lua b/lua/cellular-automaton/load.lua
index c6de515..e9a053f 100644
--- a/lua/cellular-automaton/load.lua
+++ b/lua/cellular-automaton/load.lua
@@ -55,7 +55,7 @@ local get_usable_window_width = function()
]],
true
)
- return window_width
+ return tonumber(window_width)
end
M.load_base_grid = function(window, buffer)
@@ -81,12 +81,17 @@ M.load_base_grid = function(window, buffer)
-- update with buffer data
for i, line in ipairs(data) do
- for j = 1, window_width do
- local idx = horizontal_range.start + j
- if idx <= string.len(line) then
- grid[i][j].char = string.sub(line, idx, idx)
- grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, idx)
+ local j = 0
+ local idx = vim.fn.getpos(vertical_range.start + i - 1)[3]
+ for utf8_char in line:sub(idx, -1):gmatch("[\x01-\x7F\xC2-\xF4%z][\x80-\xBF]*") do
+ j = j + 1
+ if j > window_width then
+ break
end
+
+ grid[i][j].char = utf8_char
+ grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, horizontal_range.start + j)
+ idx = idx + #utf8_char
end
end
return grid
diff --git a/lua/cellular-automaton/ui.lua b/lua/cellular-automaton/ui.lua
index 06f79e7..d60ba4d 100644
--- a/lua/cellular-automaton/ui.lua
+++ b/lua/cellular-automaton/ui.lua
@@ -53,8 +53,20 @@ M.render_frame = function(grid)
-- update highlights
vim.api.nvim_buf_clear_namespace(buffnr, namespace, 0, -1)
for i, row in ipairs(grid) do
+ local extra_width = 0
for j, cell in ipairs(row) do
- vim.api.nvim_buf_add_highlight(buffnr, namespace, cell.hl_group or "", i - 1, j - 1, j)
+ local utf8_char_len = string.len(cell.char)
+ vim.api.nvim_buf_add_highlight(
+ buffnr,
+ namespace,
+ cell.hl_group or "",
+ i - 1,
+ j - 1 + extra_width,
+ j - 1 + utf8_char_len + extra_width
+ )
+ if utf8_char_len > 1 then
+ extra_width = extra_width + utf8_char_len - 1
+ end
end
end
-- swap buffers
With these changes some cells may contain multibyte symbols. As I said it's working as expected, but it's definitely not enough to make a PR.
- If I understand correctly original code assumes that every nvim cell contains 1 byte (kinda ASCII). My changes works well with both ASCII and UTF-8 but there's a lot of other encodings like UTF-16 etc (might be useful to support them? idk)
- Emojis. They're problematic. For example, 😀 emoji contains 4 bytes (f0 9f a5 b0) but terminal emulator (alacritty in my case, might be different for other ones) shows it with 2 pixels - because of that line contains this symbol will always be 1 symbol wider than useful buffer width and will be wrapped. But if next UTF-8 symbol after emoji would be a non-printable 0xfe0f (ef b8 8f, named Variation Selector) then everything is OK since they both fills 2 cells (2+0).
I heard a bit about functions like wcwidth() to get
the number of columns needed to represent the wide character c
but I never used it (nvim doesn't provide these if I didn't miss something)
It's a tricky task to handle all these edge cases and maybe it's not worth to waste time on this.
- Invalid UTF-8 sequences. TBH i didn't test it but it also might a source of problems)
Nvim versions i checked it on were v0.10 (release one) and v0.11.0-dev-226+g7215512100
Oh, there's strwidth() function in vim (vim.fn.strwidth if calling from Lua).
:lua print(vim.inspect(vim.fn.strwidth("😀")))
returns 2 as expected
Upd: made it work with emojis and non-printables - e.g. byte 0xffff (ef bf bf in UTF-8) occupies 6 bytes and looks like <ffff> - it all handles fine via strdisplaywidth(). I'm still not sure whether I should make a PR or not, i made no tests and didn't test it carefully - just looked at some emoji-contained buffers falling in front of me) Final diff is:
diff --git a/lua/cellular-automaton/load.lua b/lua/cellular-automaton/load.lua
index c6de515..4cdca6b 100644
--- a/lua/cellular-automaton/load.lua
+++ b/lua/cellular-automaton/load.lua
@@ -55,7 +55,7 @@ local get_usable_window_width = function()
]],
true
)
- return window_width
+ return tonumber(window_width)
end
M.load_base_grid = function(window, buffer)
@@ -81,12 +81,22 @@ M.load_base_grid = function(window, buffer)
-- update with buffer data
for i, line in ipairs(data) do
- for j = 1, window_width do
- local idx = horizontal_range.start + j
- if idx <= string.len(line) then
- grid[i][j].char = string.sub(line, idx, idx)
- grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, idx)
+ local j = 0
+ local chars_displayed = 0
+ -- NOTE(libro): Since we need to iterate over (possibly)
+ -- multibyte symbols we need to know first column's byte index
+ local byte_pos = vim.fn.getpos(vertical_range.start + i - 1)[3]
+ for utf8_char in line:sub(byte_pos, -1):gmatch("[\x01-\x7F\xC2-\xF4%z][\x80-\xBF]*") do
+ chars_displayed = chars_displayed + vim.fn.strdisplaywidth(utf8_char)
+ if chars_displayed > window_width then
+ break
end
+
+ j = j + 1
+ byte_pos = byte_pos + #utf8_char
+
+ grid[i][j].char = utf8_char
+ grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, horizontal_range.start + j)
end
end
return grid
diff --git a/lua/cellular-automaton/ui.lua b/lua/cellular-automaton/ui.lua
index 06f79e7..412fee5 100644
--- a/lua/cellular-automaton/ui.lua
+++ b/lua/cellular-automaton/ui.lua
@@ -34,6 +34,7 @@ M.open_window = function(host_window)
return window_id, buffers
end
+---@param grid {char: string, hl_group: string}[][]
M.render_frame = function(grid)
-- quit if animation already interrupted
if window_id == nil or not vim.api.nvim_win_is_valid(window_id) then
@@ -44,7 +45,13 @@ M.render_frame = function(grid)
local lines = {}
for _, row in ipairs(grid) do
local chars = {}
+ local width = #row
+ local cells_displayed = 0
for _, cell in ipairs(row) do
+ cells_displayed = cells_displayed + vim.fn.strdisplaywidth(cell.char)
+ if cells_displayed > width then
+ break
+ end
table.insert(chars, cell.char)
end
table.insert(lines, table.concat(chars, ""))
@@ -52,9 +59,22 @@ M.render_frame = function(grid)
vim.api.nvim_buf_set_lines(buffnr, 0, vim.api.nvim_win_get_height(window_id), false, lines)
-- update highlights
vim.api.nvim_buf_clear_namespace(buffnr, namespace, 0, -1)
+
for i, row in ipairs(grid) do
+ local extra_width = 0
for j, cell in ipairs(row) do
- vim.api.nvim_buf_add_highlight(buffnr, namespace, cell.hl_group or "", i - 1, j - 1, j)
+ local utf8_char_len = string.len(cell.char)
+ vim.api.nvim_buf_add_highlight(
+ buffnr,
+ namespace,
+ cell.hl_group or "",
+ i - 1,
+ j - 1 + extra_width,
+ j - 1 + utf8_char_len + extra_width
+ )
+ if utf8_char_len > 1 then
+ extra_width = extra_width + utf8_char_len - 1
+ end
end
end
-- swap buffers
P.S. This code doesn't look optimized for me (strdisplaywidth() pre-computing?)