filesystem_spec
filesystem_spec copied to clipboard
fs.open: newline=None translates to os.linesep on write
Since I introduced read_text/write_text with newline= parameters, I was trying to implement the async versions of it.
Since asyncfs don't have an async version of open, for ease, I was thinking of using cat_file/pipe_file and building on it.
For the record, the implementation is going to be something like follows:
async def _read_text(self, path, encoding=None, errors=None, newline=None):
encoding = encoding or locale.getpreferredencoding(False)
errors = errors or "strict"
assert newline in (None, "", "\n", "\r", "\r\n")
contents = await self._cat_file(path)
text = contents.decode(encoding, errors)
if newline is None:
# needs an optimization when there may be no `\r` after 1st replacement
text = text.replace(b"\r\n", b"\n").replace(b"\r", "\n")
return text
async def _write_text(self, path, value: str, encoding=None, errors=None, newline=None):
encoding = encoding or locale.getpreferredencoding(False)
errors = errors or "strict"
assert newline in (None, "", "\n", "\r", "\r\n")
if newline is None:
newline = os.linesep
if newline not in ("", "\n") and "\n" in value:
value = value.replace("\n", newline)
contents = value.encode(encoding, errors)
await self._pipe_file(path, contents)
While implementing this, I was going through the docs and noticed this:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator,
os.linesep.
I always knew that it converts to os.linesep on write, but it forced me to think about whether this makes sense in the context of fsspec.
The universal newlines when reading does make sense to me, but I am not sure about write. Of course, this can be fixed by passing newline=''. It does seem like a minor issue, just wanted to see what others think, especially in the context of fs.open/fs.read_text/fs.write_text.
I am open to suggestions and comments, but I don't have any opinion myself, since I essentially never run on windows except to test against path-style. line ending and encoding related bugs.