visidata icon indicating copy to clipboard operation
visidata copied to clipboard

[path-] fix undercounted progress for multibyte chars

Open midichef opened this issue 4 months ago • 4 comments

For text files encoded with more than one byte per character, FileProgress undercounts loading progress.

To demonstrate, you can use a UTF-32 file, where every character takes 4 bytes:

seq 1000001 | iconv -t UTF-32 >! progress.utf32.tsv
vd --encoding=utf-32 progress.utf32.tsv

The progress only goes up to 25%, not 100%.

That's because read() progress is counting the characters, but the goal is measured in bytes. The file is around 7 million characters long, but when encoded in UTF-32, it is 28 million bytes, so even at the end, 7 million/28 million becomes 25%.

This PR changes FileProgress to track progress as bytes.

midichef avatar Feb 19 '24 07:02 midichef