pyfilesystem2 icon indicating copy to clipboard operation
pyfilesystem2 copied to clipboard

stream.name is string in Python stdlib, bytes in pyfilesystem2

Open prescod opened this issue 5 years ago • 4 comments

Consider the two different types generated in the following transcript:

>>> fs.__version__
'2.4.11'
>>> type(fs.open_fs(".").open("cumulusci.yml").name)
<class 'bytes'>
>>> type(open("cumulusci.yml").name)
<class 'str'>

This is inconvenient when porting code from the Python stdlib, or when trying to make code that works whether the file comes from stdlib or PyFilesystem2.

prescod avatar Jun 05 '20 20:06 prescod

I think that fundamentally OS-native paths are byte-strings? In the Pyfilestem code, I think it's this line that explicitly converts a pyfs-path (str) into a native-path (bytes).

The stdlib also behaves the same if given a bytes filepath:

>>> type(open(b"cumulusci.yml").name)
<class 'bytes'>

so maybe that's the way to keep things "consistent"? :shrug:

https://docs.python.org/3/library/functions.html#open https://docs.python.org/3/glossary.html#term-path-like-object

lurch avatar Jun 05 '20 23:06 lurch

AFAIR the stream name is literally the argument given to open, which is what @lurch showed. In fs, all the paths are encoded with the OS encoding to let you manipulate paths with unicode content without having to manage the encoding yourself. That's why in the end you get a bytes path.

althonos avatar Jun 06 '20 10:06 althonos

I had code using fs that worked with an older install under python3.6.

This code is now broken because a function from another library (mappy) to which I give the name of a file opened via fs crashes with: TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'

Test case based on the example above:

#!/usr/bin/env python3
import fs
import mappy

print(f"fs.__version: {fs.__version__}")
print(f"type(fs.open_fs('.').open('test.fa').name): {type(fs.open_fs('.').open('test.fa').name)}")
print(f"type(open('test.fa').name): {type(open('test.fa').name)}")

print(f"mappy.__version: {mappy.__version__}")
for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
    pass

Tests with both python installs:

$ python3.6 encode_mappy.py
fs.__version: 2.0.20
type(fs.open_fs('.').open('test.fa').name): <class 'str'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
$ python3.8 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
Traceback (most recent call last):
  File "encode_mappy.py", line 10, in <module>
    for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
  File "python/mappy.pyx", line 236, in fastx_read
TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'

And after upgrading fs on the 3.6 install:

$ python3.6 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
Traceback (most recent call last):
  File "encode_mappy.py", line 10, in <module>
    for _ in mappy.fastx_read(fs.open_fs(".").open("test.fa").name):
  File "python/mappy.pyx", line 236, in fastx_read
TypeError: descriptor 'encode' requires a 'str' object but received a 'bytes'

So the issue is definitively linked to an update in fs.

Wrapping the file name with str solves the issue for me

blaiseli avatar Sep 07 '20 09:09 blaiseli

Wrapping the file name with str solves the issue for me

Well, actually there's another issue, using str(fs.open_fs(".").open("test.fa").name) makes mappy.fastx_read seemingly unable to detect the content of the file (the actual reason is that the obtained path is invalid).

New test case:

#!/usr/bin/env python3
import fs
import mappy

print(f"fs.__version: {fs.__version__}")
print(f"type(fs.open_fs('.').open('test.fa').name): {type(fs.open_fs('.').open('test.fa').name)}")
print(f"type(open('test.fa').name): {type(open('test.fa').name)}")

print(f"mappy.__version: {mappy.__version__}")
filename = str(fs.open_fs(".").open("test.fa").name)
print(f"content of {filename}:")
for (name, seq, _) in mappy.fastx_read(filename):
    print(name, seq)
print("content of test.fa:")
for (name, seq, _) in mappy.fastx_read("test.fa"):
    print(name, seq)

Test with latest version of fs:

$ python3.6 encode_mappy.py
fs.__version: 2.4.11
type(fs.open_fs('.').open('test.fa').name): <class 'bytes'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
content of b'/home/bli/Documents/Informatique/benchmarks/test.fa':
content of test.fa:
 atgc
 atgc
 atgc

After reverting to the older version (python3.6 -m pip install fs==2.0.20):

$ python3.6 encode_mappy.py
fs.__version: 2.0.20
type(fs.open_fs('.').open('test.fa').name): <class 'str'>
type(open('test.fa').name): <class 'str'>
mappy.__version: 2.17
content of /home/bli/Documents/Informatique/benchmarks/test.fa:
 atgc
 atgc
 atgc
content of test.fa:
 atgc
 atgc
 atgc

The actual solution is not to wrap with str but to use the decode method:

filename = fs.open_fs(".").open("test.fa").name.decode("utf-8")

blaiseli avatar Sep 07 '20 10:09 blaiseli