pyftpdlib icon indicating copy to clipboard operation
pyftpdlib copied to clipboard

Transparent filename encoding transform?

Open giampaolo opened this issue 10 years ago • 10 comments

From [email protected] on May 01, 2013 05:05:59

Hi there.

I've been working with your library for quite a while, and it was just so 
simple yet worked like a charm.

I'm here to suggest some enhancements. There is a common problem to deal with 
FTP servers that is about encoding. While the client file name encoding is 
different from server file name encoding, we would have a lot of problems 
dealing with that. This is quite common here for Windows' default Chinese file 
name encoding is GBK yet Linux's is UTF-8. So I wish to add some transparent 
encoding transform function in my ftp server. And I think that it would need 
some implementation in the library source...

Also, if this function can simply be implemented by subclassing FTPHandler or 
something, please let me know.

Cheers.

Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=257

giampaolo avatar May 28 '14 16:05 giampaolo

From [email protected] on April 30, 2013 23:08:23

Hi there.

I solved this problem by inspecting the code and subclass FTPHandler myself.

I'll paste the code for anyone who needs it.

#code starts
from asynchat import async_chat

class EncodedProducer:
    def __init__(self, producer):
        self.producer = producer
    def more(self):
        return self.producer.more().decode("utf8").encode(encoding)

class EncodedHandler(FTPHandler):

    def push(self, s):
        async_chat.push(self, s.encode(encoding))

    def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
        if file==None:
            if isproducer:
                data=EncodedProducer(data)
            else:
                data=data.decode("utf8").encode(encoding)

        FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)

    def decode(self, bytes):
        return bytes.decode(encoding, self.unicode_errors)
#code ends

encoding stands for the target encoding you wish to transform to.

Using EncodedHandler instead of FTPHandler would help solve this problem

giampaolo avatar May 28 '14 16:05 giampaolo

From [email protected] on May 01, 2013 04:54:03

The problem here is that (AFAIK) the FTP protocol has absolutely *no* means of 
specifying or querying which character-encoding is in use - you just have to 
'hope' that the client and server are using the same encoding :-( RFC2640 ( 
https://tools.ietf.org/html/rfc2640 ) specifies that the character encoding 
SHOULD be UTF-8, and pyftpdlib is now Unicode / RFC2640 compliant. 
https://code.google.com/p/pyftpdlib/issues/list?can=1&q=unicode So I guess your 
code serves as an example of how you could _force_ a different encoding if you 
can't use UTF-8, but IMHO it shouldn't be built into the library... of course 
Giampaolo may disagree ;-)

giampaolo avatar May 28 '14 16:05 giampaolo

From g.rodola on May 01, 2013 16:31:41

Yes Andrew is right. I didn't make the server encoding configurable exactly for 
this reason: as per RFC guideline client and server have no way to agree on a 
specific encoding, therefore I thought it was better to just stick with UTF-8 
as dictated by RFC and be done with it.

If on one hand this is "the right thing to do", on the other hand perhaps there 
are cases where changing the default server encoding in order to support 
misbehaving clients might be desirable (note: at the cost of 'breaking' 
compliant ones). If this is the case I'd like to hear more about the scenario 
the OP is facing (in detail the FTP client used and what happens by using UTF-8).

That said, the code shown above changes the encoding of the control connection 
(and that might be "right") but also applies an encoding for the data exchanged 
through the data connection, and that is something which should be done only 
for the listing commands (LIST, MLSD, etc), not when transmitting files. 
What you want to do instead is override AbstractedFS's format_list() and 
format_mlsx() methods and leave FTPHandler.push_dtp_data alone:

class CustomFS(AbstractedFS):

    def format_list(self, *args, **kwargs):
        generator = AbstractedFS.format_mlst(self, *args, **kwargs)
        for item in generator:
             yield item.decode("utf8").encode(YOUR_ENCODING)

     # same for format_mlsx()


If we decide to make server encoding configurable we can avoid to go through 
all these troubles, but I'd like to hear OP's scenario first in order to figure 
out if it's actually worth the effort.

giampaolo avatar May 28 '14 16:05 giampaolo

From [email protected] on May 01, 2013 17:29:16

Just a quick note - Giampaolo's defintely right that you don't want to mess 
about with the 'encoding' for the actual file data (would give corrupted 
files), but presumably _if_ the encoding of filenames for the LIST and MLSD 
commands is being altered, then the encoding of the filenames for the 
STOR/RETR/DELE/etc. commands would need to be altered too?

giampaolo avatar May 28 '14 16:05 giampaolo

From g.rodola on May 01, 2013 17:38:30

Yes, but apparently he did that already by overriding FTPHandler.decode().

giampaolo avatar May 28 '14 16:05 giampaolo

From [email protected] on May 01, 2013 18:05:30

Hi there.

Thanks for all your responses.

Well, I understood that these "functions" would not be included in the library 
source cause it is not really a function that should be considered...

But let's talk about the codes I pasted, hmmmmm.... I know it may sound like a 
dirty hack, but shouldn't push_dtp_data always be called with a non-None file 
argument if it is transmitting files? So would it be nice to distinguish list 
commands from file data by checking the file argument in push_dtp_data? Does 
this method have any kind of limitations ?

Cheers.

giampaolo avatar May 28 '14 16:05 giampaolo

From g.rodola on May 01, 2013 18:09:43

push_dtp_data() is called with a "producer" argument also for listing commands, 
not only for files-related ones. That aside, a 'cmd' argument is also passed, 
so you might want to inspect that.

giampaolo avatar May 28 '14 16:05 giampaolo

giampaolo commented on 29 May 2014 From [email protected] on April 30, 2013 23:08:23

from asynchat import async_chat

class EncodedProducer:
    def __init__(self, producer):
        self.producer = producer
    def more(self):
        return self.producer.more().decode("utf8").encode(encoding)

class EncodedHandler(FTPHandler):

    def push(self, s):
        async_chat.push(self, s.encode(encoding))

    def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
        if file==None:
            if isproducer:
                data=EncodedProducer(data)
            else:
                data=data.decode("utf8").encode(encoding)

        FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)

    def decode(self, bytes):
        return bytes.decode(encoding, self.unicode_errors)

I got this error:

handler = EncodedHandler()
TypeError: __init__() missing 2 required positional arguments: 'conn' and 'server'

tomsux avatar Apr 20 '17 15:04 tomsux

What does this have to do with the original issue?

giampaolo avatar Apr 21 '17 13:04 giampaolo

The top

From [email protected] on May 01, 2013 05:05:59

Hi there.

I've been working with your library for quite a while, and it was just so 
simple yet worked like a charm.

I'm here to suggest some enhancements. There is a common problem to deal with 
FTP servers that is about encoding. While the client file name encoding is 
different from server file name encoding, we would have a lot of problems 
dealing with that. This is quite common here for Windows' default Chinese file 
name encoding is GBK yet Linux's is UTF-8. So I wish to add some transparent 
encoding transform function in my ftp server. And I think that it would need 
some implementation in the library source...

Also, if this function can simply be implemented by subclassing FTPHandler or 
something, please let me know.

Cheers.

Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=257 This code is work for me.When using this lib to write a simple ftp server on macOS,if you are using Chinese filename, should use EncodedHandler in your code, and set encoding="GB18030".

alextooter avatar Apr 27 '19 08:04 alextooter