datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Backslash doesn't split to child trees

Open Illviljan opened this issue 2 years ago • 8 comments

# Forward slash works:
vertebrates = DataTree.from_dict(
    name="Vertebrae",
    d={
        "/Sharks": None,
        "/Bony Skeleton/Ray-finned Fish": None,
        "/Bony Skeleton/Four Limbs": None,
    },
)
vertebrates
Out[67]: 
DataTree('Vertebrae', parent=None)
├── DataTree('Sharks')
└── DataTree('Bony Skeleton')
    ├── DataTree('Ray-finned Fish')
    └── DataTree('Four Limbs')

# Backslash doesn't work:
vertebrates = DataTree.from_dict(
    name="Vertebrae",
    d={
        "\Sharks": None,
        "\Bony Skeleton\Ray-finned Fish": None,
        "\Bony Skeleton\Four Limbs": None,
    },
)
vertebrates
Out[69]: 
DataTree('Vertebrae', parent=None)
├── DataTree('\Sharks')
├── DataTree('\Bony Skeleton\Ray-finned Fish')
└── DataTree('\Bony Skeleton\Four Limbs')

I'm guessing this is a Windows vs. Linux thing? For example, there's PurePosixPath usage here: https://github.com/xarray-contrib/datatree/blob/a206e725d6207d0ddf0675a9bca33ee18679ff29/datatree/treenode.py#L30

https://docs.python.org/3/library/pathlib.html#pathlib.PurePosixPath https://docs.python.org/3/library/pathlib.html#pathlib.PureWindowsPath

I'm guessing adding a Windows CI might find some more issues.

Illviljan avatar Jul 31 '23 18:07 Illviljan

Good catch. Yes I expect this is because I used PurePosixPath.

Q: Do you think backward slashes should be coerced into forward ones? Or an error raised? Presumably we do not want two valid separator characters?

TomNicholas avatar Jul 31 '23 18:07 TomNicholas

Windows uses backslash by default and coerces forward slashes to backslashes, I think we should handle both.

I'm hoping pathlib has a good trick for this but I haven't looked in to this enough. If not, maybe the separator can be os-dependent?

Edit: But url:s uses forwardslashes (github.com/xarray-contrib/datatree/issues/250) so we should probably just be able to split on both.

Illviljan avatar Jul 31 '23 18:07 Illviljan

If we're lucky then changing PurePosixPath to PurePath will mostly "just work".

TomNicholas avatar Jul 31 '23 19:07 TomNicholas

But url:s uses forwardslashes (github.com/https://github.com/xarray-contrib/datatree/issues/250) so we should probably just be able to split on both.

Why do you say this? You're talking about a case where a datatree is created from a url, but on a windows machine?

TomNicholas avatar Jul 31 '23 19:07 TomNicholas

Yes, I just noticed the urls on the browser is forward slashes on my windows computer. And I was thinking someone could experience that case as well.

Illviljan avatar Jul 31 '23 19:07 Illviljan

Interesting. Is it possible to create a datatree from a url right now though? Where slashes in the url become levels in the tree?

TomNicholas avatar Jul 31 '23 19:07 TomNicholas

I don't know, I haven't tried it yet. The url-checks in xr.open_mfdataset came to mind though: https://github.com/pydata/xarray/blob/afab41d30b2692c350b51a1f69dd94129bb846cb/xarray/backends/common.py#L60-L65

Illviljan avatar Jul 31 '23 19:07 Illviljan

Thinking about this again, I don't think we should support multiple separators in the node paths. The node paths have nothing to do with the filesystem, they don't represent files on disk.

We might however want to help the user when opening from or saving to multiple files, in which case on Windows we should convert backslashes in filepaths into forward slashes in node paths, and back again when saving. Until we actually have an open_mfdatatree or save_mfdatatree function we don't need to worry about this though.

TomNicholas avatar Oct 23 '23 23:10 TomNicholas

I don't think we should support backslashes in datatree paths either, group paths should have nothing to do with filesystem paths (just like urls this should always be the same across operating systems).

Closing, but if you want to continue discussing please open a new issue on the xarray repository.

keewis avatar Aug 13 '24 16:08 keewis