pyfilesystem2
pyfilesystem2 copied to clipboard
Support encoding option for ftpfs
I am fetching data from a Windows FTP server, which contains some special characters.
Traceback (most recent call last):
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/errors.py", line 125, in new_func
return func(*args, **kwargs)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/opener/ftpfs.py", line 56, in open_fs
return ftp_fs.opendir(dir_path, factory=ClosingSubFS)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/base.py", line 1247, in opendir
if not self.getinfo(path).is_dir:
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 682, in getinfo
directory = self._read_dir(dir_name)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 559, in _read_dir
self.ftp.retrlines(
File "/usr/lib64/python3.8/ftplib.py", line 461, in retrlines
line = fp.readline(self.maxline + 1)
File "/usr/lib64/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte
ipdb session:
ipdb> data
b'10-01-2021 11:00PM <DIR> Bilder V\xe4stra G\xf6taland\r\n10-06-2021 10:03AM <DIR> SeNorge\r\n'
ipdb> data.decode('utf8')
*** UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte
ipdb> data.decode('windows-1252')
'10-01-2021 11:00PM <DIR> Bilder Västra Götaland\r\n10-06-2021 10:03AM <DIR> SeNorge\r\n'
Python built-in ftplib can use a different encoding: https://docs.python.org/3/library/ftplib.html#ftplib.FTP
class ftplib.FTP(host='', user='', passwd='', acct='', timeout=None, source_address=None, *, encoding='utf-8')¶
ftpfs does not take "encoding" as parameter:
https://github.com/PyFilesystem/pyfilesystem2/blob/baa05606487d7aad2b7be5dd42a33276d463e4d1/fs/opener/ftpfs.py#L44-L52 https://github.com/PyFilesystem/pyfilesystem2/blob/baa05606487d7aad2b7be5dd42a33276d463e4d1/fs/ftpfs.py#L399-L409
I propose to accept encoding as an optional parameter, which should then passed to the FTP constructor.
It would then be possible to connect to resources like: ftp://user:password@ftpserver/path?encoding=windows-1252
There is some reference to encodings, but it seems that only utf-8 or latin-1 are handled: https://github.com/PyFilesystem/pyfilesystem2/blob/baa05606487d7aad2b7be5dd42a33276d463e4d1/fs/ftpfs.py#L492-L501
ftpfs should not override an encoding provided by the user, probably
If I set this variable to windows-1252
the software works:
https://github.com/PyFilesystem/pyfilesystem2/blob/baa05606487d7aad2b7be5dd42a33276d463e4d1/fs/ftpfs.py#L501
same problem here, although i'm working with a Linux FTP Server, but my root still contains folders with special characters, thus I can't even do a listdir() or anything with those unless this gets fixed. Thanks :).
@Timtam do you know what is the encoding used by your FTP server?
Nope, unfortunately not. I tried checking the interface of my Synology NAS, it says "UTF-8 automatic" (whatever this means), I tried debugging a FileZilla connection and found out that FileZilla will send "OPTS UTF8 ON" on first connection, but I didn't find out what the default encoding might be.
It could be an alternative solution to send to the server OPTS UTF8 ON
if pyfilesystem2 is not able to cope with different encodings.