Traceback error on unicode character in file name
tl;dr
When any file name in the entire archive has a unicode character, even with export LANG=en_US.UTF-8:
- When mounting the entire repository, the archive cannot be viewed ("Input/output error")
- When attempting to mount an archive, a traceback error happens
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
BUG
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.2
SHA256 of the executable:
29f68bd4f8b524f0c2c530d5679ea1a7fcce6bb6ffe16dbe7d07b19dbebf794a
Operating system (distribution) and version.
Linux Ubuntu 22.04 LTS
Hardware / network configuration, and filesystems used.
Hardware: Dell OptPlex Filesystems: ext4 Network: None (backup done locally on the machine itself to an external USB drive)
How much data is handled by borg?
Tested separately (independent repositories):
- Small test area (less than 1 Mb)
- A large backup of 66 Gb
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg mount -o noatime /media/paddy/mp1bu/borg/glinda::09-02T18-21 /media/paddy/diff
Describe the problem you're observing.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes, reliably reproducible on any repository, including a brand new one. Full details below, including symptom, traceback error, and what causes it.
(I posted this on Reddit before I discovered the cause. I have repeated the information below.)
Basic information about the repository
$ borg info /media/paddy/mp1bu/borg/glinda
Repository ID: [retracted]
Location: /media/paddy/mp1bu/borg/glinda
Encrypted: No
Cache: /home/paddy/.cache/borg/[retracted]
Security dir: /home/paddy/.config/borg/security/[retracted]
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
All archives: 66.79 GB 63.30 GB 61.35 GB
Unique chunks Total chunks
Chunk index: 84729 128757
List of archives (just the one so far)
$ borg list /media/paddy/mp1bu/borg/glinda
09-02T18-21 Fri, 2022-09-02 18:21:33 [retracted]
Listing the archive contents works correctly:
$ borg list /media/paddy/mp1bu/borg/glinda::09-02T18-21
[lots of files that all look correct]
First symptom
If I mount my entire repository, at first it seems to work, but then…
$ borg mount -o noatime /media/paddy/mp1bu/borg/glinda /media/paddy/diff
$ cd /media/paddy/diff
$ ls -l
total 0
drwxr-xr-x 1 paddy paddy 0 Sep 2 18:21 09-02T18-21
$ cd 09-02T18-21
$ ls -l
ls: cannot open directory '.': Input/output error
As you can see, I can't view the mounted archive directory.
I can umount OK:
$ cd
$ borg umount /media/paddy/diff
Traceback error
If I try mounting just the archive instead of the entire repository, I get a traceback error.
$ borg mount -o noatime /media/paddy/mp1bu/borg/glinda::09-02T18-21 /media/paddy/diff
Mounting filesystem
Local Exception
Traceback (most recent call last):
File "borg/archiver.py", line 5159, in main
File "borg/archiver.py", line 5090, in run
File "borg/archiver.py", line 1349, in do_mount
File "borg/archiver.py", line 183, in wrapper
File "borg/archiver.py", line 1359, in _do_mount
File "borg/fuse.py", line 545, in mount
File "borg/fuse.py", line 278, in _create_filesystem
File "borg/fuse.py", line 355, in _process_archive
File "os.py", line 812, in fsencode
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 81: ordinal not in range(128)
Platform: Linux glinda 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64
Linux: Unknown Linux
Borg: 1.2.2 Python: CPython 3.9.13 msgpack: 1.0.4 fuse: llfuse 1.4.2 [pyfuse3,llfuse]
PID: 55441 CWD: /home/paddy
sys.argv: ['borg', 'mount', '-o', 'noatime', '--verbose', '/media/paddy/mp1bu/borg/glinda::09-02T18-21', '/media/paddy/diff']
SSH_ORIGINAL_COMMAND: None
The cause
After a process of elimination, I found that the error happens whenever a file name (not the file contents) contains a unicode character.
It can even be as simple as an accented character such as é.
I have tested this on a brand new repository with just one file in the archive. It works when the file name doesn't have a unicode character, and crashes when the file name has a unicode character.
This is mentioned in the FAQ, but the proposed solution doesn't work.
- My default language is
LANG=en_GB.UTF-8 - I deleted the repository, set
export LANG=LANG=en_US.UTF-8as per the FAQ, and recreated the repository from scratch. It made no difference.
If I exclude all files with a unicode character, BorgBackup works correctly. Unfortunately, this isn't a suitable workaround for me, as I am backing up large numbers of files with such unicode characters, many of which I'm not at liberty to rename.
File "os.py", line 812, in fsencode
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 81: ordinal not in range(128)
Notable:
- that is
os.fsencode, a python standard library function - it uses the
asciiencoder, not theutf-8encoder as one would expect withLANG=en_GB.UTF-8.
Had a look at the code lines as seen in the traceback, didn't see anything that's obviously incorrect there.
Is the locale you set actually available on your system?
Try:
dpkg-reconfigure -plow locales
# select all locales you need and at least one UTF-8 locale you intend to use with borg.
dpkg-reconfigure -plow locales
These two are already marked:
- en_GB.UTF-8 UTF-8
- en_US.UTF-8 UTF-8
Would it help if I tried a different combination? If so, which ones?
No, guess these are fine. Just make sure that:
- there is no typo or so in
LANG= - that setting is also active (and
exported) in the borg environment (might be different user/shell/whatever).
https://docs.python.org/3/library/os.html#python-utf-8-mode
There are some further things to try that likely solve your problem - although it would be interesting why it does not work as you have it now. Maybe check the current values of the other env vars mentioned in these docs.
- there is no typo or so in
LANG=- that setting is also active (and
exported) in the borg environment (might be different user/shell/whatever).
I have checked and double-checked, and done it multiple times. I use copy-and-paste (specifically from the FAQ), and being sure that I included export, because your documentation makes that clear.
What's the LC_CTYPE in the borg env?
https://docs.python.org/3/library/os.html#python-utf-8-mode
There are some further things to try that likely solve your problem - although it would be interesting why it does not work as you have it now. Maybe check the current values of the other env vars mentioned in these docs.
Unfortunately, that link goes way above my head. I can program in Bash, and that's it; I wouldn't know where to start with Python.
I'm using the standalone binary downloaded from your website.
Can you try this:
$ python3
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
What's the LC_CTYPE in the borg env?
The command echo $LC_CTYPE returns nothing; the environment variable is unset.
What should I set it to?
$ python3 >>> import sys >>> sys.getfilesystemencoding() 'utf-8'
I get the same as you:
$ python3
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
>>>
On my mac, if have this:
$ echo $LANG
$ echo $LC_CTYPE
en_DE.UTF-8
Oh, that's interesting.
But I'ld assume that if the sys.getfilesystemencoding() returns utf-8, it should not use the ascii encoder.
Hmm, do we have some strange effect related to the pyinstaller-made binary?
On my mac, if have this:
$ echo $LANG $ echo $LC_CTYPE en_DE.UTF-8
Mine is the opposite way around!
$ echo $LANG
en_GB.UTF-8
$ echo $LC_TYPE
Shall I export LC_TYPE=$LANG?
You can try, but iirc the fallback of LC_CTYPE might be the value in LANG anyway.
You can try, but iirc the fallback of LC_CTYPE might be the value in LANG anyway.
I'll set up a test, and get back to you soon.
Please add the sha256 hash of the binary you use in the toplevel post after the version number.
export LC_TYPE=$LANG made no difference, as you expected.
Please add the sha256 hash of the binary you use in the toplevel post after the version number.
Done
Please also try if using borg mount --foreground ... makes a difference (if you use that, the borg process will not fork and run in the background, but instead keep running in the foreground [and blocking that terminal, so you'll need to switch to another one to continue]).
borg mount --foreground ...
Unfortunately, it still crashed with the same Traceback error.
$ borg mount -o noatime --foreground /media/paddy/mp1general/borgtest::ascii /media/paddy/diff
Local Exception
Traceback (most recent call last):
File "borg/archiver.py", line 5159, in main
File "borg/archiver.py", line 5090, in run
File "borg/archiver.py", line 1349, in do_mount
File "borg/archiver.py", line 183, in wrapper
File "borg/archiver.py", line 1359, in _do_mount
File "borg/fuse.py", line 545, in mount
File "borg/fuse.py", line 278, in _create_filesystem
File "borg/fuse.py", line 355, in _process_archive
File "os.py", line 812, in fsencode
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 27: ordinal not in range(128)
Platform: Linux glinda 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64
Linux: Unknown Linux
Borg: 1.2.2 Python: CPython 3.9.13 msgpack: 1.0.4 fuse: llfuse 1.4.2 [pyfuse3,llfuse]
PID: 117027 CWD: /home/paddy
sys.argv: ['borg', 'mount', '-o', 'noatime', '--foreground', '/media/paddy/mp1general/borgtest::ascii', '/media/paddy/diff']
SSH_ORIGINAL_COMMAND: None
I switched to my ubuntu 20.04 machine and did some experiments:
$ echo $LANG
en_US.UTF-8
$ echo $LC_CTYPE
$ mkdir test
$ cd test
$ mkdir input mnt
$ touch input/123
$ touch input/äöü # non-ascii chars
$ wget https://github.com/borgbackup/borg/releases/download/1.2.2/borg-linux64
$ sha256sum borg-linux64 # same as in top post
$ chmod +x borg-linux64
$ ./borg-linux64 init -e none repo
$ ./borg-linux64 create repo::arch input
$ ./borg-linux64 mount repo::arch mnt
$ ls mnt/input
123 äöü
So, works for me.
So, works for me.
OK, I'll create a VM with fresh installations of Ubuntu 20.04 and another with Ubuntu 22.04 to see how they work.
Then we can see if it's specific to Ubuntu 22.04 or just to my setup (which is a fresh setup, installed just 6 days ago).
I don't have time left today, so I'll get back to you once I've done this.
Shot in the dark: Does forcing the Python I/O encoding via
export PYTHONIOENCODING="utf8"
before running borg work? Or similarly trying to run it as env PYTHONIOENCODING=utf8 /path/to/borg instead of plain /path/to/borg.
export PYTHONIOENCODING="utf8"Or similarly trying to run it as
env PYTHONIOENCODING=utf8 /path/to/borg
I did try both, but neither made a difference. The error message was the same. I set PYTHONIOENCODING to an invalid value (to see what would happen), and Python complained about that, so we know that it is being looked at.
I made time to test this in a VM, and although I haven't by any means solved the problem, we can certainly narrow it down.
My VM version of Ubuntu 20.04 and Ubuntu 22.04 both work. But my main machine with Ubuntu 22.04 — a fresh installation just 6 days old — doesn't.
Nevertheless, I spotted something.
Here's my output from both VM versions when I list the file in the terminal:
-rw-rw-r-- 1 paddy paddy 5 Sep 3 18:04 äöü
But, here's the output from my main machine when I list the file in the terminal:
-rw------- 1 paddy paddy 10 Sep 3 11:15 ''$'\303\244\303\266\303\274'
If you happen to know what this means, please let me know, otherwise I'll attend to it tomorrow.
@paddylandau try run locale-gen en_GB.UTF-8 en_US.UTF-8 and then reproduce the problem
also check LC_CTYPE in your ~/.profile
and also check /etc/ssh/sshd_config for any LANG LC_* settings
I set LC_CTYPE in my profile and ran locale-gen en_GB.UTF-8 en_US.UTF-8.
I don't have a file /etc/ssh/sshd_config, but I'm not using SSH anyway; this is local on my machine with the Borg backup directly onto a USB hard drive.
On top of all that, I also tried this solution.
I rebooted, but sadly none of this helped.
I shall ask on the Ubuntu Forums for help. I'll update this post with the link, and post back here should I find the answer.
Thank you for all of your time on this matter. I do appreciate it.
Well, I finally found the problem — and it's nothing to do with BorgBackup!
I had LC_ALL=C. This messed up everything!
I've unset LC_ALL, and everything works correctly now.
Thank you again for all the time and effort that you have put into this. Sorry to have wasted your time, but you definitely did help push me in the right direction to find the solution.
I hope that this helps someone else.
Maybe we could add this to the docs. ^^^
Can someone make a pull request against master branch?