Unexpected change in path conversion behaviour after #143
After #143, my xmllint command has broken:
(msys2-runtime 3.4.6-2)
$ xmllint --xpath 'string(settings/servers/server[id = "gitlab"]/configuration/httpHeaders/property/name)' ~/.m2/settings.xml
XPath error : Invalid expression
string(settings/servers/server[id = "gitlab"]C:/msys64_b/configuration/httpHeaders/property/name)
^
XPath evaluation failure
because the space in the argument somehow means that the path conversion behaviour is triggered for the next part of the argument, despite the whole thing being quoted.
I've managed to reduce it to a minimal example below:

It's not quite clear from #143 whether this was intentional or not, certainly the behaviour is a little strange. If so, what's the correct work around? I suppose I could set MSYS2_ARG_CONV_EXCL=* for just that command, but that feels hacky at best. Is there something more permanent I can set in my environment that will work for "all" commands?
xmllint --xpath 'string(settings/servers/server[id = "gitlab"]/configuration/httpHeaders/property/name)' ~/.m2/settings.xml
Hmm. That's an interesting case.
While I still think that the space should not exclude command-line parameters from the automatic path conversion, I see a lot of tell-tales in that --xpath parameter that should tell MSYS2 that this is not a path:
- double quote characters are not allowed in Win32 pathnames
- the equal sign followed by a white-space clearly indicates that this is not a path
- the equal sign inside parentheses should indicate that this is not a path
- a trailing parenthesis strongly suggests that this is not a path
Having said that, I notice that the path conversion kicks in after the closing bracket, not for the entire path... There must be something in MSYS2's path conversion logic that considers a slash after a closing bracket as starting a "POSIX" path. I guess the culprit to be located here.
I guess the culprit to be located here.
That turns out to be the case:
- First, the function is entered with the path
string(settings/servers/server[id = "gitlab"]/configuration/httpHeaders/property/name) - The code flow procedes until the
=character is hit and then calls the function recursively, this time with the string"gitlab"]/configuration/httpHeaders/property/name). - The non-alphanumeric charaters are then swallowed, including the first double-quote character.
- The code flow procedes until the second
"character is hit and calls the function again, this time with]/configuration/httpHeaders/property/name). - Again, non-alphanumerical characters are swallowed, this time only the closing bracket.
- Now, the remainder looks totally like an absolute Unix path, starting with a forward slash (even if the unmatched closing parenthesis at the end is "funny").
So where does that leave us? I think what we want is to detect a = character followed by white-space in that non-alphanumeric swallowing loop, i.e. something like this (the diff was made in msys2-tests, not in msys2-runtime):
diff --git a/runtime/path_convert/src/main.cpp b/runtime/path_convert/src/main.cpp
index 87900b2..6b2ceb1 100644
--- a/runtime/path_convert/src/main.cpp
+++ b/runtime/path_convert/src/main.cpp
@@ -34,7 +34,8 @@ typedef struct test_data_t {
#endif
static const test_data datas[] = {
- {"/usr/lib:/var:", MSYSROOT "\\usr\\lib;" MSYSROOT "\\var;", false}
+ {"string(settings/servers/server[id = \"gitlab\"]/configuration/httpHeaders/property/name)", "string(settings/servers/server[id = \"gitlab\"]/configuration/httpHeaders/property/name)", false}
+ ,{"/usr/lib:/var:", MSYSROOT "\\usr\\lib;" MSYSROOT "\\var;", false}
,{"-LIBPATH:../lib", "-LIBPATH:../lib", false}
,{"-LIBPATH:../lib:/var", "-LIBPATH:..\\lib;" MSYSROOT "\\var", false}
,{"//Collection:http://tfsserver", "//Collection:http://tfsserver", false}
diff --git a/runtime/path_convert/src/path_conv.cpp b/runtime/path_convert/src/path_conv.cpp
index f072533..0aa151b 100644
--- a/runtime/path_convert/src/path_conv.cpp
+++ b/runtime/path_convert/src/path_conv.cpp
@@ -195,6 +195,10 @@ path_type find_path_start_and_type(const char** src, int recurse, const char* en
if (*it == '\0' || it == end) return NONE;
while (!isalnum(*it) && *it != '/' && *it != '\\' && *it != ':' && *it != '-' && *it != '.') {
+ if (*it == '=' && (it + 1 == end || isspace(it[1]))) {
+ *src = end;
+ return NONE;
+ }
recurse = true;
it = ++*src;
if (it == end || *it == '\0') return NONE;
@@ -287,6 +291,10 @@ path_type find_path_start_and_type(const char** src, int recurse, const char* en
starts_with_minus = false;
if ((ch == '=') || (ch == ':' && starts_with_minus) || ((ch == '\'' || ch == '"') && result == NONE)) {
*src = it2 + 1;
+ if (*src == end || isspace(it2[1])) {
+ *src = end;
+ return NONE;
+ }
return find_path_start_and_type(src, true, end);
}
What do you think, @lazka?
Cygwin is less intrusive in the path translation:
$ xmllint --xpath 'string(settings/servers/server[id = "gitlab"]/configuration/httpHeaders/property/name)' src/wine/dlls/msxml3/tests/xmlview.xml
$ uname -a
CYGWIN_NT-10.0-19044 DESKTOP-O7JE7JE 3.4.6-1.x86_64 2023-02-14 13:23 UTC x86_64 Cygwin
Of course I tried to reproduce the error (by inserting the C:) to be sure xmllint works at all:
$ xmllint --xpath 'string(settings/servers/server[id = "gitlab"]C:/configuration/httpHeaders/property/name)' src/wine/dlls/msxml3/tests/xmlview.xml
XPath error : Invalid expression
string(settings/servers/server[id = "gitlab"]C:/configuration/httpHeaders/property/name)
^
XPath evaluation failure
(I'm a bit busy right now, I'll have a look next week. Thanks for the detailed investigation)