cellular-automaton.nvim icon indicating copy to clipboard operation
cellular-automaton.nvim copied to clipboard

utf-8 support

Open librolibro opened this issue 1 year ago • 2 comments

Hi, I like your plugin a lot - it's exactly what I need at certain moments) But it doesn't work well when buffer contains UTF-8 symbols with 2 or more chars (emojis, cyrillics etc.). I made some changes to the source code (see diff below). This code changes are too small and raw to make a PR. It works as expected (not ideal, see the section after diff):

diff --git a/lua/cellular-automaton/load.lua b/lua/cellular-automaton/load.lua
index c6de515..e9a053f 100644
--- a/lua/cellular-automaton/load.lua
+++ b/lua/cellular-automaton/load.lua
@@ -55,7 +55,7 @@ local get_usable_window_width = function()
     ]],
     true
   )
-  return window_width
+  return tonumber(window_width)
 end

 M.load_base_grid = function(window, buffer)
@@ -81,12 +81,17 @@ M.load_base_grid = function(window, buffer)

   -- update with buffer data
   for i, line in ipairs(data) do
-    for j = 1, window_width do
-      local idx = horizontal_range.start + j
-      if idx <= string.len(line) then
-        grid[i][j].char = string.sub(line, idx, idx)
-        grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, idx)
+    local j = 0
+    local idx = vim.fn.getpos(vertical_range.start + i - 1)[3]
+    for utf8_char in line:sub(idx, -1):gmatch("[\x01-\x7F\xC2-\xF4%z][\x80-\xBF]*") do
+      j = j + 1
+      if j > window_width then
+        break
       end
+
+      grid[i][j].char = utf8_char
+      grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, horizontal_range.start + j)
+      idx = idx + #utf8_char
     end
   end
   return grid
diff --git a/lua/cellular-automaton/ui.lua b/lua/cellular-automaton/ui.lua
index 06f79e7..d60ba4d 100644
--- a/lua/cellular-automaton/ui.lua
+++ b/lua/cellular-automaton/ui.lua
@@ -53,8 +53,20 @@ M.render_frame = function(grid)
   -- update highlights
   vim.api.nvim_buf_clear_namespace(buffnr, namespace, 0, -1)
   for i, row in ipairs(grid) do
+    local extra_width = 0
     for j, cell in ipairs(row) do
-      vim.api.nvim_buf_add_highlight(buffnr, namespace, cell.hl_group or "", i - 1, j - 1, j)
+      local utf8_char_len = string.len(cell.char)
+      vim.api.nvim_buf_add_highlight(
+        buffnr,
+        namespace,
+        cell.hl_group or "",
+        i - 1,
+        j - 1 + extra_width,
+        j - 1 + utf8_char_len + extra_width
+      )
+      if utf8_char_len > 1 then
+        extra_width = extra_width + utf8_char_len - 1
+      end
     end
   end
   -- swap buffers

With these changes some cells may contain multibyte symbols. As I said it's working as expected, but it's definitely not enough to make a PR.

  1. If I understand correctly original code assumes that every nvim cell contains 1 byte (kinda ASCII). My changes works well with both ASCII and UTF-8 but there's a lot of other encodings like UTF-16 etc (might be useful to support them? idk)
  2. Emojis. They're problematic. For example, 😀 emoji contains 4 bytes (f0 9f a5 b0) but terminal emulator (alacritty in my case, might be different for other ones) shows it with 2 pixels - because of that line contains this symbol will always be 1 symbol wider than useful buffer width and will be wrapped. But if next UTF-8 symbol after emoji would be a non-printable 0xfe0f (ef b8 8f, named Variation Selector) then everything is OK since they both fills 2 cells (2+0).

I heard a bit about functions like wcwidth() to get

the number of columns needed to represent the wide character c

but I never used it (nvim doesn't provide these if I didn't miss something)

It's a tricky task to handle all these edge cases and maybe it's not worth to waste time on this.

  1. Invalid UTF-8 sequences. TBH i didn't test it but it also might a source of problems)

Nvim versions i checked it on were v0.10 (release one) and v0.11.0-dev-226+g7215512100

librolibro avatar Jul 10 '24 15:07 librolibro

Oh, there's strwidth() function in vim (vim.fn.strwidth if calling from Lua). :lua print(vim.inspect(vim.fn.strwidth("😀")))

returns 2 as expected

librolibro avatar Jul 10 '24 16:07 librolibro

Upd: made it work with emojis and non-printables - e.g. byte 0xffff (ef bf bf in UTF-8) occupies 6 bytes and looks like <ffff> - it all handles fine via strdisplaywidth(). I'm still not sure whether I should make a PR or not, i made no tests and didn't test it carefully - just looked at some emoji-contained buffers falling in front of me) Final diff is:

diff --git a/lua/cellular-automaton/load.lua b/lua/cellular-automaton/load.lua
index c6de515..4cdca6b 100644
--- a/lua/cellular-automaton/load.lua
+++ b/lua/cellular-automaton/load.lua
@@ -55,7 +55,7 @@ local get_usable_window_width = function()
     ]],
     true
   )
-  return window_width
+  return tonumber(window_width)
 end
 
 M.load_base_grid = function(window, buffer)
@@ -81,12 +81,22 @@ M.load_base_grid = function(window, buffer)
 
   -- update with buffer data
   for i, line in ipairs(data) do
-    for j = 1, window_width do
-      local idx = horizontal_range.start + j
-      if idx <= string.len(line) then
-        grid[i][j].char = string.sub(line, idx, idx)
-        grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, idx)
+    local j = 0
+    local chars_displayed = 0
+    -- NOTE(libro): Since we need to iterate over (possibly)
+    --   multibyte symbols we need to know first column's byte index
+    local byte_pos = vim.fn.getpos(vertical_range.start + i - 1)[3]
+    for utf8_char in line:sub(byte_pos, -1):gmatch("[\x01-\x7F\xC2-\xF4%z][\x80-\xBF]*") do
+      chars_displayed = chars_displayed + vim.fn.strdisplaywidth(utf8_char)
+      if chars_displayed > window_width then
+        break
       end
+
+      j = j + 1
+      byte_pos = byte_pos + #utf8_char
+
+      grid[i][j].char = utf8_char
+      grid[i][j].hl_group = get_dominant_hl_group(buffer, vertical_range.start + i, horizontal_range.start + j)
     end
   end
   return grid
diff --git a/lua/cellular-automaton/ui.lua b/lua/cellular-automaton/ui.lua
index 06f79e7..412fee5 100644
--- a/lua/cellular-automaton/ui.lua
+++ b/lua/cellular-automaton/ui.lua
@@ -34,6 +34,7 @@ M.open_window = function(host_window)
   return window_id, buffers
 end
 
+---@param grid {char: string, hl_group: string}[][]
 M.render_frame = function(grid)
   -- quit if animation already interrupted
   if window_id == nil or not vim.api.nvim_win_is_valid(window_id) then
@@ -44,7 +45,13 @@ M.render_frame = function(grid)
   local lines = {}
   for _, row in ipairs(grid) do
     local chars = {}
+    local width = #row
+    local cells_displayed = 0
     for _, cell in ipairs(row) do
+      cells_displayed = cells_displayed + vim.fn.strdisplaywidth(cell.char)
+      if cells_displayed > width then
+        break
+      end
       table.insert(chars, cell.char)
     end
     table.insert(lines, table.concat(chars, ""))
@@ -52,9 +59,22 @@ M.render_frame = function(grid)
   vim.api.nvim_buf_set_lines(buffnr, 0, vim.api.nvim_win_get_height(window_id), false, lines)
   -- update highlights
   vim.api.nvim_buf_clear_namespace(buffnr, namespace, 0, -1)
+
   for i, row in ipairs(grid) do
+    local extra_width = 0
     for j, cell in ipairs(row) do
-      vim.api.nvim_buf_add_highlight(buffnr, namespace, cell.hl_group or "", i - 1, j - 1, j)
+      local utf8_char_len = string.len(cell.char)
+      vim.api.nvim_buf_add_highlight(
+        buffnr,
+        namespace,
+        cell.hl_group or "",
+        i - 1,
+        j - 1 + extra_width,
+        j - 1 + utf8_char_len + extra_width
+      )
+      if utf8_char_len > 1 then
+        extra_width = extra_width + utf8_char_len - 1
+      end
     end
   end
   -- swap buffers

P.S. This code doesn't look optimized for me (strdisplaywidth() pre-computing?)

librolibro avatar Jul 11 '24 15:07 librolibro