filesystem_spec
filesystem_spec copied to clipboard
fuse: 'memory' is (mis)interpreted as 'file://', not 'memory://'
In the example: python3 -m fsspec.fuse memory /usr/share /tmp/mem
I'd expect to have a mount of a memory file system to /tmp/mem (or an error because /usr/share does not exist in the newly created memory fs).
What happens is that url_to_fs does not understand 'memory' to mean 'memory://' ... so you end up with a local filesystem and not a memory one .... which is confusing. (i.e. what you get is as if you'd run python3 -m fsspec.fuse file:// /usr/share /tmp/mem
This works fine btw: python3 -m fsspec.fuse memory:// / /tmp/mem
I'd suggest either 'memory' be interpreted as 'memory://' or an exception be raised rather than quietly creating a local file system. I'd go with the exception ... along the lines of " 'memory' is not a valid filesystem url. DId you mean 'memory://'? "
So the issue is requiring "memory://" instead of "memory"? The doc of the CLI does suggest that "memory" alone should work.
This is all defined in fsspec.fuse.main, if you would like to edit. I believe it might be around the choice to call url_to_fs instead of the simpler filesystem function.
Well, I'd suggest the first step is establishing the design intent.
Is the intent to require a URL? And in that case should the documentation be updated to explicitly use "memory://" (and mount from a path that exists) in addition to a code change to validate that a URL has indeed been passed?
btw I did step through the code with pdb. I think I was looking at (and referred to) the call to url_to_fs you mention.
the first step is establishing the design intent
Indeed. What do you think? :) I don't think FUSE is used much, because the backend library has some flakiness of its own, and there are other, non-python, ways to mount in-memory or some remote filesystems like s3/ssh/ftp...
Since you smiled :-) ...
Given what fsspec.fuse.main() comment says now ("Mount filesystem from chained URL to MOUNT_POINT ...") I suggest requiring a URL and dropping the second argument.
So the examples: python3 -m fsspec.fuse memory /usr/share /tmp/mem python3 -m fsspec.fuse local /tmp/source /tmp/local \ -l /tmp/fsspecfuse.log
... become: python3 -m fsspec.fuse memory:// /tmp/mem python3 -m fsspec.fuse file://tmp/source /tmp/local \ -l /tmp/fsspecfuse.log
... assuming that in all cases the source path can be expressed in a URL (which I'd expect to be true).
Perhaps an additional argument which allows non-built-in fsspec implementations to be specified for import would be handy too: e.g. python3 -m fsspec.fuse my_scheme://xxx /tmp/xxx --implementation_module my.module.here (but with a better argument name).
FUSE is useful for experimentation and testing, and as training wheels which can get a project going pending a non-python way being developed (should such an alternative way be deemed to be necessary in the end).
I think I agree with all of your points. However, we might want to also keep the current behaviour for back-compatibility in the case of three arguments (and also allow "memory" for this case?)
On the one hand, requiring a URL means the second argument can be eliminated and the implementation matches the current headline ("Mount ... from chained URL") is respected.
On the other hand, requiring a URL means that backwards compatibility is lost for those who wanted to use the non-URL three argument scheme, source path, target path.
These seem to be mutually incompatible.
A way to provide both options might be to add an explicit from_url and from_scheme options and deprecate the current option, e.g. something like python -m fsspec.fuse from_url memory:// /tmp/target python -m fsspec.fuse from_scheme memory / /tmp/target ... using just fsspec.fuse arg1 arg2 arg3 would raise a deprecated exception but continue with the current behavior for now. This option would allow both ways to be used and make it explicit which was intended by the caller.
But from_url / from_scheme does look clunky. The fsspec.fuse {url} {target} {other args} is cleaner so perhaps biting the bullet at requiring a URL is the way to go? Those who currently use the scheme name only approach and get bitten by a lack of backward compatibility when they pick up a newer version of fsspec could get a nice error message "URL required, did you mean {scheme}://?".
I'd lean towards that last option ... biting the bullet and requiring the URL (with a nice error message). ... but I really don't think this is my call.
Probably the implementation should match what we claim it ought to be, and we can detect the "wrong number of arguments" as you say. I think that's a worthwhile improvement.