visidata
visidata copied to clipboard
[path-] fix undercounted progress for multibyte chars
For text files encoded with more than one byte per character, FileProgress
undercounts loading progress.
To demonstrate, you can use a UTF-32 file, where every character takes 4 bytes:
seq 1000001 | iconv -t UTF-32 >! progress.utf32.tsv
vd --encoding=utf-32 progress.utf32.tsv
The progress only goes up to 25%, not 100%.
That's because read()
progress is counting the characters, but the goal is measured in bytes. The file is around 7 million characters long, but when encoded in UTF-32, it is 28 million bytes, so even at the end, 7 million/28 million becomes 25%.
This PR changes FileProgress
to track progress as bytes.