mesmerize-core icon indicating copy to clipboard operation
mesmerize-core copied to clipboard

Use Path.resolve() consistently to prevent UNC/drive letter ambiguity

Open ethanbb opened this issue 1 year ago • 3 comments

This fixes an issue I had on Windows where pathlib reported that one or more output paths in CNMF did not have a common root with the output dir. This is because the file paths were normalized with Path.resolve(), whereas the directory path was not. On Windows, resolve() converts any remote mount paths from drive letter to UNC format, and in order to compute the relative path, both paths must be in the same format.

This change calls resolve() on the directory path before computing the other paths relative to it. (Another solution could be to just create these relative paths directly and concatenate them to the dir path when saving data.) I also changed the paths.split function, which also calls relative_to, to always resolve both paths before trying to compute the relative path.

ethanbb avatar Jun 21 '24 16:06 ethanbb

@kushalkolar Here's an example of the problem, using some of the existing code from cnmf.py (sorry for the delay). The resolve method converts drive paths to UNC paths if the drive is mapped to a network share on the current system. Then, if you call relative_to on such a path with a drive letter path as the argument, you get an error. Calling resolve on the common root path (output_dir) and defining all the other paths relative to this one should prevent this issue.

I'm not sure it can be tested automatically; would definitely require some changes to the Windows test runner to set up this situation.

In [1]: from pathlib import Path

In [2]: drive_path = Path('Z:\\foo\\bar')

In [3]: output_dir = drive_path.parent.joinpath('baz')

In [4]: output_dir
Out[4]: WindowsPath('Z:/foo/baz')

In [5]: output_path = output_dir.joinpath('baz.hdf5').resolve()

In [6]: output_path
Out[6]: WindowsPath('//proektdata/LabData/foo/baz/baz.hdf5')

In [7]: output_path.relative_to(output_dir.parent)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 output_path.relative_to(output_dir.parent)

File ~\AppData\Local\miniforge3\envs\mesviz\lib\pathlib.py:818, in PurePath.relative_to(self, *other)
    816 if (root or drv) if n == 0 else cf(abs_parts[:n]) != cf(to_abs_parts):
    817     formatted = self._format_parsed_parts(to_drv, to_root, to_parts)
--> 818     raise ValueError("{!r} is not in the subpath of {!r}"
    819             " OR one path is relative and the other is absolute."
    820                      .format(str(self), str(formatted)))
    821 return self._from_parsed_parts('', root if n == 1 else '',
    822                                abs_parts[n:])

ValueError: '\\\\proektdata\\LabData\\foo\\baz\\baz.hdf5' is not in the subpath of 'Z:\\foo' OR one path is relative and the other is absolute.

ethanbb avatar Jul 19 '24 19:07 ethanbb

I think let's wait until #300 so that CI works and then rebase to make sure this works on all platform before merging

kushalkolar avatar Aug 31 '24 11:08 kushalkolar

300 is merged, did you mean a different one?

ethanbb avatar Sep 01 '24 04:09 ethanbb

I ran into this same problem. If that would work, one can use a synced network directory to work directory which would ease syncronisation, backup and multisite access to the process. On windows, junctions could be used to get rid of spaces in the path. Any chance to have this completed?

katonage avatar Feb 25 '25 07:02 katonage

Absolutely let me take a look later today! Fell through the cracks

ethanbb avatar Feb 25 '25 13:02 ethanbb

Thank you!

katonage avatar Feb 26 '25 08:02 katonage

@ethanbb I think you have maintainer permissions here, merge whenever you think it's ready

kushalkolar avatar Feb 26 '25 19:02 kushalkolar

@kushalkolar OK, I'm working on getting at least one of the test runs to succeed. Also, it seems like I still need an approving review from a reviewer with write access.

ethanbb avatar Feb 26 '25 20:02 ethanbb

I'm not reproducing this tolerance error on my Windows machine... Max relative difference: 105.32658421 that seems really high, no? I see the rtol is already 100 though. @kushalkolar do you know about this?

ethanbb avatar Mar 01 '25 00:03 ethanbb

Anyway, the tests are all passing on my local setup with this issue, and the failure in the Windows runner here isn't related to the issue, so I'll go ahead and merge.

ethanbb avatar Mar 01 '25 06:03 ethanbb

I use windows and my data root is a junction to a folder having spaces in its name.

using proekt/rel-path-fix, f4e5b2f24bd77d90c3ade756bf7d6991ab7c7059 it works fine. using origin/master cf6b0168f53efe84649313fb43dc842c7d66bdb2 i get an error at df.caiman.add_item( algo='mcorr', input_movie_path=movie_path, ...

i get error: ... c:\Users\gerge\anaconda3\envs\caiman\lib\site-packages\mesmerize_core\caiman_extensions\common.py:252, in CaimanDataFrameExtensions.reload_from_disk(self) 237 def reload_from_disk(self) -> pd.DataFrame: 238 """ 239 Returns the DataFrame on disk. 240 ... 59 "hyphens ( - ), underscores ( _ ) or periods ( . )" 60 ) 61 return path

ValueError: Paths must only contain alphanumeric characters, hyphens ( - ), underscores ( _ ) or periods ( . )

my movie_path is: "C:\Users\gerge\Projects\ChemoGen\ChemoGen02\Region_1\PREPARED\IMAGE_DATA_UG.h5"

im not sure the underlying CaImAn repo version has anything to do with this.

katonage avatar Mar 02 '25 09:03 katonage

@katonage ah ok, I did make some changes to call resolve earlier on inputted paths, thinking it would be better to use the canonical form wherever possible. I do see the potential for that to cause a problem though if a junction or other alias was used intentionally to bypass errors with unsupported types of paths in the called procedures.

@kushalkolar is there a good reason mesmerize-core validates to disallow paths with spaces? If not, the overly strict validation is the real bug. If so, I can see if it would work to keep aliases whenever possible and only resolve them when a relative path needs to be computed.

ethanbb avatar Mar 02 '25 15:03 ethanbb

The reason is it creates a run file script which is what runs in the subprocess, and dealing with spaces there can be hard

On Sun, Mar 2, 2025, 10:22 AM Ethan Blackwood @.***> wrote:

@katonage https://github.com/katonage ah ok, I did make some changes to call resolve earlier on inputted paths, thinking it would be better to use the canonical form wherever possible. I do see the potential for that to cause a problem though if a junction or other alias was used intentionally to bypass errors with unsupported types of paths in the called procedures.

@kushalkolar https://github.com/kushalkolar is there a good reason mesmerize-core validates to disallow paths with spaces? If not, the overly strict validation is the real bug. If so, I can see if it would work to keep aliases whenever possible and only resolve them when a relative path needs to be computed.

— Reply to this email directly, view it on GitHub https://github.com/nel-lab/mesmerize-core/pull/304#issuecomment-2692779610, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHXXRCQNJNB26EWLAHCINT2SMOZ5AVCNFSM6AAAAABXZ4GE3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJSG43TSNRRGA . You are receiving this because you were mentioned.Message ID: @.***> [image: ethanbb]ethanbb left a comment (nel-lab/mesmerize-core#304) https://github.com/nel-lab/mesmerize-core/pull/304#issuecomment-2692779610

@katonage https://github.com/katonage ah ok, I did make some changes to call resolve earlier on inputted paths, thinking it would be better to use the canonical form wherever possible. I do see the potential for that to cause a problem though if a junction or other alias was used intentionally to bypass errors with unsupported types of paths in the called procedures.

@kushalkolar https://github.com/kushalkolar is there a good reason mesmerize-core validates to disallow paths with spaces? If not, the overly strict validation is the real bug. If so, I can see if it would work to keep aliases whenever possible and only resolve them when a relative path needs to be computed.

— Reply to this email directly, view it on GitHub https://github.com/nel-lab/mesmerize-core/pull/304#issuecomment-2692779610, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHXXRCQNJNB26EWLAHCINT2SMOZ5AVCNFSM6AAAAABXZ4GE3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJSG43TSNRRGA . You are receiving this because you were mentioned.Message ID: @.***>

kushalkolar avatar Mar 02 '25 16:03 kushalkolar

OK thanks. When I get a chance I'll add a test for this and try to make the subprocess backend work with paths with spaces.

ethanbb avatar Mar 02 '25 16:03 ethanbb

Thank you very much for your efforts!

katonage avatar Mar 03 '25 09:03 katonage

@katonage Can you check that it works now? You should also be able to use it without the junction, if you want to.

Edit: forgot to say I added a dependency, you will have to install mslex with conda or pip.

ethanbb avatar Mar 04 '25 01:03 ethanbb

It works for me now with both junctions and paths with spaces. Thank you!

katonage avatar Mar 04 '25 10:03 katonage