Veracrypt GUI on Linux sets wrong encoding to be used for filenames
Expected behavior
I have files and directories on an NTFS based Veracrypt container, whose names contain hungarian characters: é, ü, á, Ã, Å, ... I expect that these names are printed accurately, like when I directly mount an NTFS filesystem.
Observed behavior
Hungarian accented characters are garbled. Some programs show a question mark in place, the ls command utility prints 'Eg'$'\351''szs'$'\351''g'$'\374''gy' instead of "Egészségügy".
Steps to reproduce
- Open Veracrypt GUI
- Pick a slot
- Select a VC container file
- Click mount
- Type password and accept (so no custom mount options are set)
- Observe path names in mounted directory
Your Environment
VeraCrypt version: 1.26.20
Operating system and version: openSUSE Leap 15.6, Linux kernel 6.4.0-150600.23.38-default (64-bit)
System type: 64 bit Linux
The VC container was created on a Windows 10 system.
Additional information
Mounting an entire NTFS-based VC partition has no such issues, and has the following mount options:
/dev/mapper/veracrypt1 on /mnt/vera/a type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)
Mounting the VC container results in these mount options:
/dev/mapper/veracrypt2 on /media/veracrypt2 type vfat (rw,relatime,uid=1000,gid=100,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
There are many differences (why no nosuid, nodev?), most importantly there is a codepage and an iocharset option set.
I just realized that this VC container might hold a FAT-something filesystem.
In any case, the point of this issue is to make Veracrypt be able to automatically detect (or prompt the user for easy questions) the correct mount options, as it happens on Windows.
I fear that this only works on Windows out of box because there the system holds that information in some form.
Turns out the correct mount option in my case is iocharset=utf8.
I understand if we can not change the default, but I think it would be worth to consider
- adding a warning when the filesystem is a kind of FAT
- adding some kind of setting for the iocharset to be used
Especially considering that this is not something easy to find on your own, I think.
I was trying with codepage 852 and 1250 (even though thats for short names only), and iocharset iso8859-2 (which is the legacy Hungarian windows codepage) before I found out that this parameter can just be set to utf8.
I believe the default options for mount iocharset are compiled with the kernel, so this probably works for most people out of the box (at least on both my Arch Linux and Ubuntu VMs it is set to iocharset=utf8 by default). So I think this would only be a problem for an user if they were to create the files on one system and then move them into another one with different default mount encoding. The risk of forcing it to be utf8 for everyone could break behaviour for existing users whose systems by default use and mount with different encoding and they have no problems with how it currently works.
There are some ways to detect the encoding but relying on certain programs existing on the system is not a great solution, so most sensible path from my view would be to add instructions to the troubleshooting part of the documentation on what to do if the filenames look scrambled up.
I believe the default options for mount iocharset are compiled with the kernel, so this probably works for most people out of the box (at least on both my Arch Linux and Ubuntu VMs it is set to iocharset=utf8 by default).
man mount says here that the default is iso8859-1
Mount options for fat [...] iocharset=value Character set to use for converting between 8 bit characters and 16 bit Unicode characters. The default is iso8859-1. Long filenames are stored on disk in Unicode format.
But its good to be aware that there is a variety in this.
so most sensible path from my view would be to add instructions to the troubleshooting part of the documentation on what to do if the filenames look scrambled up.
That sounds good to me. And at least now there is also an issue with the information if someone else ends up with it too :)