filesystem_spec
filesystem_spec copied to clipboard
Support stateful transactions by allowing to pass keyword arguments
A file system's corresponding transaction class has been a class attribute since https://github.com/fsspec/filesystem_spec/pull/1424.
However, the fsspec.transaction.Transaction interface is currently stateless. For someone trying to roll a transaction class that takes some state in the constructor, this results in incompatibilities that can break some of fsspec's most important features, e.g. like so:
from fsspec import filesystem
from fsspec.utils import get_protocol
class MyFS:
protocol = "myfs"
...
def transaction(foo: str = "hello") -> MyTransaction:
return MyTransaction(foo)
def transact(uri, **kwargs):
protocol = get_protocol(uri)
fs = filesystem(protocol, **kwargs)
with fs.transaction as tx: # <- bang! if protocol == "myfs", tx will be MyFS's `transaction` class API.
...
Question is, would you support moving from the current "transactions as properties" approach to a more functional approach allowing for state passing to the respective transaction class?
Thanks for thinking about this. I don't immediately know how, but I feel what you want should be doable without breaking the current API, perhaps with some context or other FS instance-level state
Thanks for the tip. We ended up defining a __call__() operator on our custom transaction class (which we abuse as sort of a second __init__() method), so that with fs.transaction(...) as tx actually resolves to Transaction.__call__().__enter__(), but fs.transaction is still a property.
This is obviously a bit hacky, but without opening up e.g. a context argument slot in Transaction.__init__(), I don't see how you could go about making this cleaner. (You could of course always set the required transaction state after entering the context, but that's extra work and does not feel natural to me.)
I have thought of a very nice use case for this. When writing zarr stores, there is a mixture of ".z" metadata small text metadata files and larger binary files being written. So, it would be nice to specilise the transaction to only capture paths which meet some pattern. Even better would be to modify/wrap the ls/dircache behaviour so that while the transaction is live, these files can be found, and without asking the remote store.
Sounds great. Our use case is essentially a write to a git-ish branch, which needs metadata like repository, branch name, and some flags. It's a mixed bag between positionals and keyword arguments.
If you want to take a look, our implementation can be found here. We're also interested in listing files in transaction, even though we might not be able to use them, since we do not use the AbstractBufferedFile anymore and our file-like objects seem to be written eagerly on close().
Of course, I'm happy to support on any part, or specifically the ls() part if you like.
we might not be able to use them
Actually, transactions could use a general workup. At the moment, they only refer to files written using open(), which means that you might get transactional writing with pipe_file if it depends on open() (e.g., memory), but probably not. Instead, we should queue pipe and put calls and send them async if possible on completion.
Your transaction does some or all of this already, since you have a definite temporary place in the remote to write to.
You may be interested in https://github.com/fsspec/filesystem_spec/pull/1531 , a different take on write transactions.