lf icon indicating copy to clipboard operation
lf copied to clipboard

[BUG] Some non-ASCII characters get stuck in the preview pane

Open mhdna opened this issue 2 years ago • 11 comments

Hello,

Thanks for this amazing project. I use LF every day!

I think this bug is related to https://github.com/gokcehan/lf/issues/807, in which some symbols get stuck in the preview pane when I open a file containing non-ASCII characters or even when previewing it sometimes.

lf_bug.webm

mhdna avatar Aug 02 '23 08:08 mhdna

hmm, i cant reproduce this bug.

what version of lf are you currently on ? i currently use version 30 and my terminal emulator is Alacritty.

noornee avatar Aug 05 '23 07:08 noornee

@noornee I'm on the latest r30 too, and I use Alacritty as well, though, this doesn't matter as I can reproduce it in other terminal emulators.

You can reproduce it by putting these few lines into a file, and making sure to move this file into an empty directory for the bug to be apparent

🤭 face with hand over mouth
🫢 face with open eyes and hand over mouth
🫣 face with peeking eye
🫡 saluting face
🤐 zipper-mouth face
🫥 dotted line face

Then run lf --config - to use the default barebone lf config, and navigate to that file, open it in vim for example then exit it.

Weirdly enough, this bug doesn't take place with all emoji characters for example, but with some specific emojis and other non- ASCII characters, like the ones I provided above.

mhdna avatar Aug 05 '23 09:08 mhdna

You can reproduce it by putting these few lines into a file, and making sure to move this file into an empty directory for the bug to be apparent

whoaa, i just followed the steps you provided and it reproduced the bug!

Weirdly enough, this bug doesn't take place with all emoji characters for example, but with some specific emojis and other non- ASCII characters, like the ones I provided above.

ah, true. that was why i wasn't able to reproduce this because the bug didn't occur with the emoji characters i originally provided.

noornee avatar Aug 05 '23 10:08 noornee

It looks like the issue is coming from the go-runewidth library, which is used to determine whether a given character is single or double width. Normally you would expect that each of those emojis in the example given above are double width, but it turns out that runewidth.RuneWidth returns a width of 1 for some emojis and 2 for others.

I wrote a very quick program to read in text from standard input and print the widths of each rune:

package main

import (
	"bufio"
	"fmt"
	"os"

	"github.com/mattn/go-runewidth"
)

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		for _, r := range []rune(scanner.Text()) {
			fmt.Printf("%s\t%x\t%d\n", string(r), r, runewidth.RuneWidth(r))
		}
	}
}

I only have basic knowledge of Unicode, but if anyone is able to play around with this, and also cross-reference the results with the table for double width characters (which I believe are scraped from https://unicode.org), that would be much appreciated.

joelim-work avatar Aug 05 '23 13:08 joelim-work

So I looked into this a bit more and at this point I think the issue is not to do with the lf code, but rather the tcell TUI library, which also depends on go-runewidth.

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

package main

import (
	"os"
	"time"

	"github.com/gdamore/tcell"
	"github.com/mattn/go-runewidth"
)

func setText(screen tcell.Screen) {
	text := "hello world"
	if len(os.Args) > 1 {
		text = os.Args[1]
	}

	x := 0
	for _, c := range text {
		screen.SetContent(x, 0, c, nil, tcell.StyleDefault)
		x += runewidth.RuneWidth(c)
	}
}

func main() {
	screen, _ := tcell.NewScreen()
	screen.Init()

	screen.Clear()
	setText(screen)
	screen.Show()
	time.Sleep(time.Second)

	screen.Clear()
	screen.Show()
	time.Sleep(time.Second)

	screen.Fini()
}

By the way, the documentation for Screen.SetContent might be relevant:

// SetContent sets the contents of the given cell location. If // the coordinates are out of range, then the operation is ignored. // // The first rune is the primary non-zero width rune. The array // that follows is a possible list of combining characters to append, // and will usually be nil (no combining characters.) // // The results are not displayed until Show() or Sync() is called. // // Note that wide (East Asian full width and emoji) runes occupy two cells, // and attempts to place character at next cell to the right will have // undefined effects. Wide runes that are printed in the // last column will be replaced with a single width space on output.

joelim-work avatar Aug 06 '23 03:08 joelim-work

So I looked into this a bit more and at this point I think the issue is not to do with the lf code, but rather the tcell TUI library, which also depends on go-runewidth.

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

wow, this is interesting!! great stuff you did to narrow the issue down to tcell. ^^

i suppose we are gonna have to open an issue at tcell repo for this to be hopefully resolved

noornee avatar Aug 06 '23 10:08 noornee

I would appreciate it if other people confirm this issue as well - just in case it turns out to be an issue specific to my machine.

joelim-work avatar Aug 06 '23 13:08 joelim-work

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

I can't claim to have experience with Go, if any at all, but after grabbing the dependencies and running your code with these specific inputs I can confirm that the behavior is exactly as you have described.

mhdna avatar Aug 06 '23 14:08 mhdna

OK I figured it out, actually tcell also uses go-runewidth for calculating character widths, so the issue is really with the latter.

It turns out that the tables are generated using https://github.com/mattn/go-runewidth/blob/master/script/generate.go and uses https://unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt as the source (W means wide character). Unfortunately the spec is out of date as new characters get introduced, which results in incorrect character widths.

Someone has created a PR to fix this though by updating the version to Unicode 14.0.0: https://github.com/mattn/go-runewidth/pull/56. Although the latest version is now 15.0.0, version 14.0.0 should still fix the issue, at least for these particular set of emojis.

Anyway, until that PR gets merged, there's not much more I can do here, since the issue isn't actually to do with the lf code.

joelim-work avatar Aug 07 '23 02:08 joelim-work

It turns out that the tables are generated using https://github.com/mattn/go-runewidth/blob/master/script/generate.go and uses https://unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt

It's really interesting the way you were able to narrow down this issue piece by piece until you found the real reason behind it.

Someone has created a PR to fix this though by updating the version to Unicode 14.0.0: mattn/go-runewidth#56.

Though, it seems that the PR you mentioned has been there forever, and the guys behind go-runewidth for whatever reason aren't willing to merge it yet.

Hopefully, they take some action to upgrade their Unicode source soon, for other repos such as this to get resolved.

Thank you brother Joe, you've done an amazing job!

mhdna avatar Aug 07 '23 16:08 mhdna

It's really interesting the way you were able to narrow down this issue piece by piece until you found the real reason behind it.

tbh. it's amazing.

Thank you brother Joe, you've done an amazing job!

ditto!

noornee avatar Aug 07 '23 18:08 noornee

go-runewidth has been updated to support Unicode 15.1.0, which is the latest version. Closing this now.

joelim-work avatar Jul 23 '24 03:07 joelim-work