pyfilesystem2
pyfilesystem2 copied to clipboard
ftpfs seek issue
I have recently been using the ftp file system against a container running the vsftpd server and have come upon an issue that prevents the following snippet of code working:
with open_fs("ftp://anonymous:anonymous@localhost:8021") as fs:
with fs.openbin("test.bin", "w") as f:
f.seek(1024)
f.write(b"hello")
f.seek(2048)
f.write(b"there")
The problem occurs with the second call to seek where, internally, the ftpfs code tries to send a QUIT to the ftp server after the write of b"hello" has started but before it has completed (i.e. the data connection to the server is closed) - vsftpd wishes to have the data transfer complete before it processes any new commands, so the QUIT hangs waiting for its response until a server timeout causes an exception to be thrown.
It may be that some ftp servers allow executing one command before another completes, but in this case vsftpd doesn't. I suspect the ftp file system code can be made more tolerant of this behaviour.
Looking at seek in the FTPFile class, the code performed in the block with self._lock held does three things:
- Determines the new position to use within the file (possibly based on file size)
- Closes and re-opens the ftp control connection
- Closes any outstanding data connections
The problem initially described is in the ordering of the last two items, in that some ftp servers (e.g. vsftpd) will not handle the 'QUIT' sent in (2) until any outstanding transfer is completed via (3).
A further potential problem (unverified) is that until the data transfer has completed in (3), the file size that may be needed for (1) might not be accurate.
As a possible fix, I have been using a patched version of seek where the locked block looks like so:
with self._lock:
if self._read_conn:
self._read_conn.close()
self._read_conn = None
if self._write_conn:
self._write_conn.close()
self._write_conn = None
self.ftp.voidresp() # Ensure last write completed
self.ftp.quit()
self.ftp = self._open_ftp()
if _whence == Seek.set:
...
self.pos = max(0, new_pos)
Ordering the processing in this manner seems to behave better in the use cases I have here. Note also I've added a call to voidresp, as is done in the close method, to eat any response coming back from the write - however this may not be necessary as a QUIT is being sent immediately thereafter with the control connection then being closed.
the ftpfs code tries to send a QUIT to the ftp server after the write of b"hello" has started but before it has completed (i.e. the data connection to the server is closed) - vsftpd wishes to have the data transfer complete before it processes any new commands
I dunno much about FTP (and it's been a while since I looked at the low-level details of PyFilesystem), but would a "better fix" be for the write method to not return until the data transfer has completed? Seems like that'd be a more robust way of preventing the FTP-server receiving commands "out of sequence"? Apologies if that's a naive suggestion!
Also: why does the seek method close-and-reopen the FTP connection @willmcgugan ?
Also2: Would it make sense for seek to close the read_conn and write_conn before calling ftp.quit? :shrug: ...oh! I see that's exactly what @al8ba is suggesting above :laughing:
would a "better fix" be for the write method to not return until the data transfer has completed?
The openbin is opening a connection to an ftp server, which can then be used for multiple writes to stream data to - we've not finished doing writes until there is some other activity, e.g. seeking to a different location, just before which we want to say the transfer is complete. This is why an individual write does not wait until everything is done ... because it doesn't know that there isn't another write to come. Also why the seek does this stuff.
Also: why does the seek method close-and-reopen the FTP connection @willmcgugan ?
FTP doesn't support true seek. All you can do is close the connections and resume at a given offset.
All you can do is close the connections and resume at a given offset.
Is that a feature that all FTP servers support?
Is that a feature that all FTP servers support?
It probably isn't TBH. But I think its almost as universal as you'll get with FTP. Resuming transfers being such a common requirement...
Is that a feature that all FTP servers support?
If an ftp server doesn't like it, they return an error code, which can be dealt with. I know of one ftp server that seems not to like seek/write outside the existing bounds of a file - it threw back a permissions error, which after some verification resulted in me moving on to use a different ftp server.
know of one ftp server that seems not to like seek/write outside the existing bounds of a file - it threw back a permissions error
You could argue that that's reasonable behaviour, given how old and how basic the FTP specs are :wink: