pyfilesystem2
pyfilesystem2 copied to clipboard
stream.name is string in Python stdlib, bytes in pyfilesystem2
Consider the two different types generated in the following transcript:
>>> fs.__version__
'2.4.11'
>>> type(fs.open_fs(".").open("cumulusci.yml").name)
<class 'bytes'>
>>> type(open("cumulusci.yml").name)
<class 'str'>
This is inconvenient when porting code from the Python stdlib, or when trying to make code that works whether the file comes from stdlib or PyFilesystem2.
I think that fundamentally OS-native paths are byte-strings? In the Pyfilestem code, I think it's this line that explicitly converts a pyfs-path (str) into a native-path (bytes).
The stdlib also behaves the same if given a bytes filepath:
>>> type(open(b"cumulusci.yml").name)
<class 'bytes'>
so maybe that's the way to keep things "consistent"? :shrug:
https://docs.python.org/3/library/functions.html#open https://docs.python.org/3/glossary.html#term-path-like-object
AFAIR the stream name is literally the argument given to open, which is what @lurch showed. In fs, all the paths are encoded with the OS encoding to let you manipulate paths with unicode content without having to manage the encoding yourself. That's why in the end you get a bytes path.
I had code using fs that worked with an older install under python3.6.
This code is now broken because a function from another library (mappy) to which I give the name of a file opened via fs crashes with: TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'
Test case based on the example above:
#!/usr/bin/env python3
import fs
import mappy
print(f"fs.__version: {fs.__version__}")
print(f"type(fs.open_fs('.').open('test.fa').name): {type(fs.open_fs('.').open('test.fa').name)}")
print(f"type(open('test.fa').name): {type(open('test.fa').name)}")
print(f"mappy.__version: {mappy.__version__}")
for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
pass
Tests with both python installs:
$ python3.6 encode_mappy.py
fs.__version: 2.0.20
type(fs.open_fs('.').open('test.fa').name): <class 'str'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
$ python3.8 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
Traceback (most recent call last):
File "encode_mappy.py", line 10, in <module>
for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
File "python/mappy.pyx", line 236, in fastx_read
TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'
And after upgrading fs on the 3.6 install:
$ python3.6 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
Traceback (most recent call last):
File "encode_mappy.py", line 10, in <module>
for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
File "python/mappy.pyx", line 236, in fastx_read
TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'
So the issue is definitively linked to an update in fs.
Wrapping the file name with str solves the issue for me
Wrapping the file name with str solves the issue for me
Well, actually there's another issue, using str(fs.open_fs(".").open("test.fa").name) makes mappy.fastx_read seemingly unable to detect the content of the file (the actual reason is that the obtained path is invalid).
New test case:
#!/usr/bin/env python3
import fs
import mappy
print(f"fs.__version: {fs.__version__}")
print(f"type(fs.open_fs('.').open('test.fa').name): {type(fs.open_fs('.').open('test.fa').name)}")
print(f"type(open('test.fa').name): {type(open('test.fa').name)}")
print(f"mappy.__version: {mappy.__version__}")
filename = str(fs.open_fs(".").open("test.fa").name)
print(f"content of {filename}:")
for (name, seq, _) in mappy.fastx_read(filename):
print(name, seq)
print("content of test.fa:")
for (name, seq, _) in mappy.fastx_read("test.fa"):
print(name, seq)
Test with latest version of fs:
$ python3.6 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
content of b'/home/bli/Documents/Informatique/benchmarks/test.fa':
content of test.fa:
atgc
atgc
atgc
After reverting to the older version (python3.6 -m pip install fs==2.0.20):
$ python3.6 encode_mappy.py
fs.__version: 2.0.20
type(fs.open_fs('.').open('test.fa').name): <class 'str'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
content of /home/bli/Documents/Informatique/benchmarks/test.fa:
atgc
atgc
atgc
content of test.fa:
atgc
atgc
atgc
The actual solution is not to wrap with str but to use the decode method:
filename = fs.open_fs(".").open("test.fa").name.decode("utf-8")