msys2-runtime icon indicating copy to clipboard operation
msys2-runtime copied to clipboard

Regression from pathconv changes

Open naveen521kk opened this issue 3 months ago • 22 comments

Previously msys2-runtime converted the path separator from : to ; when passing to a Windows program. Now, for some reason, this doesn't work:

$ A=.:../test python -c 'import os; print(os.environ["A"])'
.:../test

Though if it's the PATH variable it seems to work.

$ PATH=".:../test" /mingw64/bin/python -c 'import os; print(os.environ["PATH"])'
.;C:\Users\test

Output from an old version of runtime for reference:

$ A=.:../test python -c 'import os; print(os.environ["A"])'
.;..\test

This is probably why rebuilding texlive errors out, see https://github.com/msys2/MINGW-packages/pull/19921#issuecomment-1923454422.

naveen521kk avatar Mar 08 '24 16:03 naveen521kk

Seems like this is caused by https://github.com/msys2/msys2-runtime/commit/9944652c41fb96f555d221865f4ed5d0c8b33514 Any path with :. or :.. in it is ignored.

For example, this works:

$ A="/this:/there" python -c 'import os; print(os.environ["A"])'
C:\msys64\this;C:\msys64\there
$ A="./this:/there" python -c 'import os; print(os.environ["A"])'
.\this;C:\msys64\there

But this doesn't:

A="/this:./there" python -c 'import os; print(os.environ["A"])'
/this:./there
A="./this:./there" python -c 'import os; print(os.environ["A"])'
./this:./there

Voxeles avatar Mar 24 '24 22:03 Voxeles

@dscho It seems as your commit https://github.com/msys2/msys2-runtime/commit/9944652c41fb96f555d221865f4ed5d0c8b33514 broke the pathconv rules as described here. Unfortunately, the rules are not complete w.r.t. relative paths. The official documentation says:

$ python3 -c "import sys; print(sys.argv)" --dir=/foo
['-c', '--dir=C:/msys64/foo']
$ python3 -c "import sys; print(sys.argv)" --dir=/foo:/bla
['-c', '--dir=C:\\msys64\\foo;C:\\msys64\\bla']

Relative paths have not been mentioned so far, but the expected behavior should be something like this:

$ python3 -c "import sys; print(sys.argv)" --dir=/foo:./bla
['-c', '--dir=C:\\msys64\\foo;.\\bla']

With the current pathconv rules, we get a different - flawed - picture

$ python3 -c "import sys; print(sys.argv)" --dir=/foo:./bla
['-c', '--dir=/foo:./bla']

Note, that we get something completely different, if we do the following:

$ python3 -c "import sys; print(sys.argv)" --dir=./foo:/bla
['-c', '--dir=.\\foo;C:\\msys64\\bla']

The change seems to be incomplete and hence break some packages, such as mentioned here https://github.com/msys2/MINGW-packages/pull/19921

Any thought how we could fix this again?

reneparis avatar Apr 15 '24 15:04 reneparis

Just wanted to add that MSYS2 already provides a mechanism to exclude specific command line arguments (and environment variables) from the path conversion, as is also described in the MSYS2 documentation under the same link that @reneparis posted (see MSYS2_ARG_CONV_EXCLand MSYS2_ENV_CONV_EXCL). Maybe using this mechanism would be a more consistent approach than trying to intercept the conversion in msys2_path_conv.cc (https://github.com/msys2/msys2-runtime/commit/9944652c41fb96f555d221865f4ed5d0c8b33514), where no contextual information about the strings to be converted is available.

ReinhardBiegelIntech avatar Apr 15 '24 15:04 ReinhardBiegelIntech

Any thought how we could fix this again?

Not completely.

As stated in the commit message of https://github.com/msys2/msys2-runtime/commit/9944652c41fb96f555d221865f4ed5d0c8b33514, the idea is to not confuse Git's <rev>:./<filename> for a path list (and incidentally, it would also help things like scp some.host:./file ./

Now, in your examples, the path lists all start with either / or ./. Would this always be the case? If yes, I think we can reliably discern between the cases I do not want to be converted and the cases you want to be converted.

dscho avatar Apr 15 '24 15:04 dscho

Now, in your examples, the path lists all start with either / or ./. Would this always be the case? If yes, I think we can reliably discern between the cases I do not want to be converted and the cases you want to be converted.

The way I'd do that is by initializing something like bool is_potential_path_list = *it == '/' || (*it == '.' && *it == '/'); before that while() loop, and then prefix the if() condition testing for Git's <rev>:./name syntax with !is_potential_path_list.

@reneparis wanna give it a whirl? It might be a couple of days until I get back to this...

dscho avatar Apr 15 '24 15:04 dscho

Sorry, but this does not work, 'cause the example I gave was arbitrary. The broken package uses the following variable WEBINPUTS=.:../../../texk/web2c (see https://github.com/msys2/MINGW-packages/pull/19921#issuecomment-2042220562). I've looked into the build scripts, but it's not easy to change the assignment. On top, this would only hotfix a broken behavior.

The conversion skipping mechanism is not specific enough and in my opinion introduces a lot of unexpected behavior. The two requirements, git path resolution and arbitrary path conversion, do not go well together unless the converter knows that it is in the context of git or scp. Just operating on the characters is not enough. Can we leverage this somewhere?

reneparis avatar Apr 16 '24 05:04 reneparis

The broken package uses the following variable WEBINPUTS=.:../../../texk/web2c (see msys2/MINGW-packages#19921 (comment)).

Well, the prefix .: would be easy enough to catch: it is neither a valid scp target nor a <rev>:<path> notation.

dscho avatar Apr 16 '24 07:04 dscho

The two requirements, git path resolution and arbitrary path conversion, do not go well together unless the converter knows that it is in the context of git or scp. Just operating on the characters is not enough. Can we leverage this somewhere?

At the moment, I don't think that the path conversion is privy of the information that it operates on a given command, in particular because it also operates on environment variables. It is an interesting idea, though.

dscho avatar Apr 16 '24 07:04 dscho

@dscho I'd be interested in the original intention of the change in https://github.com/msys2/msys2-runtime/commit/9944652c41fb96f555d221865f4ed5d0c8b33514. Was it to satisfy the test suite? Or are there real world examples (e.g. of broken packages in MSYS2)?

Also, the commit message states

... but path lists are expected to contain only absolute paths ...

which is directly related to our problem here. What is this statement based on? I find it to be quite common to use relative paths in path lists, especially in (C/C++) build systems.

At the moment, I don't think that the path conversion is privy of the information that it operates on a given command...

The exact point where this information is already available is when calling the command. Thus, as stated previously, I'd give MSYS2_ARG_CONV_EXCL a try. Did you look into that?

Maybe we can work out another solution for that together? In my opinion the current situation is not acceptable as the change breaks more things than it fixes and thus should be reverted (if we cannot come up with a better solution rather soon).

btw, even though this is IMO the wrong approach (as is just catches a special case): the code snippet from above has a bug in the expression and should maybe be: bool is_potential_path_list = *it == '/' || (*it == '.' && *(it+1) == '/'); (note the +1). But it would still be missing ../foo, or just let's assume /foo/HEAD:../baar for example.

ReinhardBiegelIntech avatar Apr 17 '24 06:04 ReinhardBiegelIntech

@lazka - what is your opinion on that?

reneparis avatar Apr 17 '24 07:04 reneparis

I'd be interested in the original intention of the change in 9944652. Was it to satisfy the test suite?

While it does satisfy Git's test suite, the underlying reason for that fix is that it does address a class of real-world problems, as users such as myself use Git to inspect <rev>:<path> contents quite often, and similarly clone via SSH from <host>:<path>, where <path> may very well be a relative path (which is relative the current directory in the former case, and relative to the home directory on the SSH host in the latter).

Also, the commit message states

... but path lists are expected to contain only absolute paths ...

which is directly related to our problem here. What is this statement based on?

Back when the change was originally made (in 2015), the package builds I sampled all used absolute paths in order to avoid problems when, say, setting the PERLLIB in the project root and then using it in a subdirectory. Lots of things have changed since then, but all the packages I regularly build for Git for Windows (see https://github.com/git-for-windows/git-for-windows-automation/actions/workflows/build-and-deploy.yml for many of them) follow that same pattern.

At the moment, I don't think that the path conversion is privy of the information that it operates on a given command...

The exact point where this information is already available is when calling the command. Thus, as stated previously, I'd give MSYS2_ARG_CONV_EXCL a try. Did you look into that?

Oh, let me clarify a misunderstanding: I addressed the problem mentioned in the referenced commit not for me, but for all of Git for Windows' users. Judging by the download count (which I have to use in lieu of more reliable data that I however cannot collect because that would cause High Drama In The Open Source World), the number of those users is in the millions.

So: no, I cannot simply use MSYS2_ARG_CONV_EXCL (especially because it does not scale well at all).

Maybe we can work out another solution for that together? In my opinion the current situation is not acceptable as the change breaks more things than it fixes and thus should be reverted (if we cannot come up with a better solution rather soon).

Well, in the spirited words of The Dude:

Image

😁

Don't get me wrong (and especially please understand that I say this in good humor, not to offend you or something): I understand where you are coming from, I also understand, though, that we're dealing with an ill-posed problem here. Basically only a human can tell with certainty whether a given command-line option or environment variable is supposed to contain a path list that needs to be translated from Unix-style to Windows-style, or whether it contains something different that distinctly does not refer to a Windows path. And to be truthful: I would expect most humans to fail on that task, too, if provided sufficiently hairy corner cases (think of a script that is called with an argument that eventually is appended to PERLLIB. In a heredoc that is passed to wsl sh. How many users do you know who can reliably reason about that?).

btw, even though this is IMO the wrong approach

Again, please appreciate that at this moment, this is just an opinion, and I cannot really tell how much experience you really have in the space. As I said, I am building MSYS2 packages on a regular basis since 2015, so I would contend that I have a little bit of experience here that should count for something.

(as is just catches a special case):

Hah! The entire path conversion is a special case. It is a gift that keeps on giving!

the code snippet from above has a bug in the expression and should maybe be: bool is_potential_path_list = *it == '/' || (*it == '.' && *(it+1) == '/'); (note the +1).

I concur. And I never claimed that it was correct. I said "something like", and was hoping that someone would be enticed to give it a whirl and open a PR so that we can do more than just talk. Obviously that plan failed rather miserably, joining so many of my past schemes 😉

But it would still be missing ../foo, or just let's assume /foo/HEAD:../baar for example.

It would be missing the former, although that is not even a path list and would therefore not run into the code introduced by the commit under discussion.

It would not be missing the latter, though, as we're literally looking at a slash at the beginning of the path list, not after any colon.

In the short run, I still think that we can probably figure out reliable patterns to catch most of the common scenarios.

We will never catch all of the scenarios, of course, which is the reason why we need "escape hatches". In scenarios where the automatic path conversion is overzealous, those escape hatches would be:

  • MSYS_NO_PATHCONV as a big hammer
  • MSYS2_ARG_CONV_EXCL and MSYS2_ENV_CONV_EXCL

In scenarios where the automatic path conversion fails to kick in, but would be desirable:

  • Use "$(cygpath -wp "$arg")" instead of "$arg" for path lists
  • Use "$(cygpath -w "$arg")" for paths.

I don't think that we will ever be able to completely prevent the need for those escape hatches.

Of course, I could imagine that eventually we'd introduce support for configuring regular expressions via environment variables to override the MSYS2 runtime's default classification. But then...:

Image

dscho avatar Apr 17 '24 07:04 dscho

So here is a PR with the quick fix; Once the PR build finishes, I hope that y'all can give it a whirl to see whether the MSYS2 runtime built that way addresses your concerns?

dscho avatar Apr 17 '24 08:04 dscho

So here is a PR with the quick fix; Once the PR build finishes, I hope that y'all can give it a whirl to see whether the MSYS2 runtime built that way addresses your concerns?

Unfortunately, the problem still existed. Simply revert 0042-fixup-Add-functionality-for-converting-UNIX-paths-in.patch fix this issue

nyfair avatar Apr 17 '24 18:04 nyfair

Simply revert 0042-fixup-Add-functionality-for-converting-UNIX-paths-in.patch fix this issue

This is not a great suggestion, given that that patch was introduced to fix problems in the first place. The suggestion is to re-introduce those problems. You're just asking your needs to be favored over others, without weighing the overall benefits, which a large project like MSYS2 cannot afford.

Also, please do not mingle unrelated tickets. In this here ticket, we are not discussing what happens to brackets in the auto-magic (or not so magic) path conversion, but what happens to path lists used in package builds.

But maybe I should have been more specific, and not said "y'all" when I really meant users who are affected by the problem described in this ticket.

dscho avatar Apr 17 '24 20:04 dscho

This is not a great suggestion, given that that patch was introduced to fix problems in the first place. The suggestion is to re-introduce those problems. You're just asking your needs to be favored over others, without weighing the overall benefits, which a large project like MSYS2 cannot afford.

Also, please do not mingle unrelated tickets. In this here ticket, we are not discussing what happens to brackets in the auto-magic (or not so magic) path conversion, but what happens to path lists used in package builds.

But maybe I should have been more specific, and not said "y'all" when I really meant users who are affected by the problem described in this ticket.

To be more specific, revert that patch not only fix issue 187, it also fix this problem.

$ A=.:../test python -c 'import os; print(os.environ["A"])'
.;..\test

$ pacman -Q | grep runtime
msys2-runtime 3.4.10-8

All problems related to pathconv are comes from msys2-runtime 3.4.10-4. We may have a more consistent approach to fix the problem of Git for Windows based on msys2-runtime 3.4.10-2

nyfair avatar Apr 18 '24 04:04 nyfair

To be more specific, revert that patch not only fix issue 187, it also fix this problem.

And cause a regression. Great suggestion!

We may have a more consistent approach to fix the problem of Git for Windows based on msys2-runtime 3.4.10-2

Git for Windows is just a placeholder for all programs that expect <something>:<something-else> inputs. I already pointed to scp, and I am sure you, even, could think of plenty others if you would just try.

This continued suggestion to regress on a fix just to fix another regression, in effect trying to get us into a vicious cycle of regression hell, is really not helpful, I find.

dscho avatar Apr 18 '24 07:04 dscho

Let's get back again to the "we have an issue, let's fix it together" mode. The damage is done, let's focus on how to fix it. But I vote against the "patch until no one screams anymore marathon".

git-for-windows uses a brilliant test suite, making sure that it's requirements are covered. That's how it should be done and I completely understand @dscho that he tried to make sure that the requirements are satisfied. Unfortunately msys2-runtime lacks such a suite and so "our" requirements are now violated.

If we want to get both together, we need a

  1. a list of path conversions so that the MSYS community is happy and
  2. a test stage for msys2-runtime (not available right now, or is there already something I'm not aware of?)

In an ideal world, I'd also love to see the git checks run automatically on every change in msys2, so we don't need to duplicate that stuff and run out-of-sync over time.

reneparis avatar Apr 18 '24 08:04 reneparis

If you have more tests in mind for your point 1, you could probably add them to the tests here: https://github.com/msys2/msys2-tests/tree/main/runtime

mmuetzel avatar Apr 18 '24 08:04 mmuetzel

Unfortunately msys2-runtime lacks such a suite and so "our" requirements are now violated.

@reneparis There is actually a test suite, and I worked a bit to get it working in https://github.com/msys2/msys2-runtime/issues/135.

And I agree with @mmuetzel that the msys2-tests is the right place to document expectations, specifically https://github.com/msys2/msys2-tests/blob/39eb3db6801e090a5f59b1b6199ddb443c06be86/runtime/path_convert/src/main.cpp#L36-L239.

Hooking up the msys2-tests so that they are run as part of the CI builds in msys2/msys2-runtime is something I've had on my wish list for a long time, and if other projects would not get into the way, I would already have done it. That would actually be a great way for contributors who are as sick of talking without acting as I am to help out.

dscho avatar Apr 18 '24 08:04 dscho

If you have more tests in mind for your point 1, you could probably add them to the tests here: https://github.com/msys2/msys2-tests/tree/main/runtime

That was the piece of the puzzle I was missing.

reneparis avatar Apr 18 '24 08:04 reneparis

Seems I missed my opportunity to reply to @dscho's thorough answer from above. I wanted to point out various reasons why I disagree, but as things seems to go into a positive direction again, I don't want to add further noise 😄

Looking forward finding a good solution together 👍

ReinhardBiegelIntech avatar Apr 18 '24 13:04 ReinhardBiegelIntech

Seems I missed my opportunity to reply to @dscho's thorough answer from above. I wanted to point out various reasons why I disagree, but as things seems to go into a positive direction again, I don't want to add further noise 😄

@ReinhardBiegelIntech I fear that you just did add further noise and did contribute in the direction you claimed not to want, though, merely by giving in to the urge of add this comment without any other contribution in furtherance of this issue.

Having said that, I'd love to see some constructive contribution from your side. For example, you could contribute a PR to run the path_conv part of msys2-tests as part of CI/PR builds in msys2/msys2-runtime. I would consider that an actually helpful contribution to this thread.

dscho avatar Apr 18 '24 15:04 dscho

Having said that, I'd love to see some constructive contribution from your side. For example, you could contribute a PR to run the path_conv part of msys2-tests as part of CI/PR builds in msys2/msys2-runtime. I would consider that an actually helpful contribution to this thread.

Hey @dscho

@ReinhardBiegelIntech and I "gave it a whirl" three weeks ago and I'd really like get some progress here (or actually our customer). I don't know how the MSYS team is organized internally and who actually feels responsible for this (@lazka, @mmuetzel, you?). But without anyone giving us a hint if the things we're whirling up is right, participating feels pretty frustrating.

Is it possible that you or anyone from the team can take a look at it in the near future or at least give us a time horizon?

Thanks @reneparis

  • https://github.com/msys2/msys2-tests/pull/67
  • https://github.com/msys2/msys2-runtime/pull/213

reneparis avatar May 23 '24 09:05 reneparis

I'm sorry to be so scarce on time, the security work leading up to Git v2.45.1 and follow-up work to address fall-out is demanding most of my attention.

dscho avatar May 23 '24 09:05 dscho