AHK-v2-script-converter icon indicating copy to clipboard operation
AHK-v2-script-converter copied to clipboard

RegExMatch output variable causes incorrect transformations

Open Lexikos opened this issue 1 year ago • 8 comments

Input:

        if RegExMatch(uri, "^\[url=")
            RegExMatch(uri, "\G[^\]]*", uri, 6)
        else
        {
            MsgBox 1,, URI appears invalid:`n%uri%
            IfMsgBox Cancel
                return
        }

Incorrect output:

        if RegExMatch(uri, "^\[url=")
            RegExMatch(uri[0], "\G[^\]]*", &uri, 6)
        else
        {
            msgResult := MsgBox("uri[0] appears invalid:`n" uri[0], "", 1)
            if (msgResult = "Cancel")
                return
        }

There are multiple issues:

  • The call which replaces the string in uri with a RegExMatchInfo incorrectly has its input changed from uri to uri[0]. On input, the variable still contains a string.
  • uri still contains a string in the else branch, so should not have been transformed.
  • The text "URI" inside the quoted string should not have been transformed even if it was in the if branch.

I think a simpler, less error-prone approach would be for uri := uri && uri[0] to be inserted after the RegExMatch call. Appending && uri := uri[0] to the call would also work if the return value is not being stored.

Lexikos avatar Jul 22 '23 04:07 Lexikos

Properties with the same name are also erroneously transformed.

Input:

RegExMatch(haystack, regex, output)
x := task.output

Incorrect output:

RegExMatch(haystack, regex, &output)
x := task.output[0]

Lexikos avatar Jul 22 '23 04:07 Lexikos

Unrelated variables and parameter names elsewhere in the script (perhaps hundreds of lines away?) are also transformed.

Input:

a(s, r) {
    RegExMatch(s, r, m)
    return m
}
b(m) {
    return m
}

Incorrect output:

a(s, r) {
    RegExMatch(s, r, &m)
    return m[0]
}
b(m[0]) {
    return m[0]
}

Lexikos avatar Jul 22 '23 04:07 Lexikos

For the maintainers, I will attempt to cleanup and clarify parts about _RegExMatch and ConvertPseudoArray. I'll write back about things that might require more changes.

safetycar avatar Jul 22 '23 09:07 safetycar

For the maintainers, I will attempt to cleanup and clarify parts about _RegExMatch and ConvertPseudoArray. I'll write back about things that might require more changes.

Thanks for your note, its helpful so people don't do duplicate work. If you work in a branch on your own repo, you can just tag this issue and then we should see a notification here.

Remember to add the associated test cases. You could just use the examples provided by Lexikos

mmikeww avatar Jul 22 '23 15:07 mmikeww

The current way of updating the names is a simple replacement on a full line. I see it difficult to improve much without some bigger changes.

The changes made are:

  • Putting the reason for name replacements in a less ambiguous way.
  • A small change to not replace properties.
  • Restrict recognition of named captures to mitigate intrusive replacements. The replacement will only be done with names that can be gathered in the function conversion (this leaves out contents that could be inside variables).

But sadly:

  • There are still replacements inside quoted text.
  • There is still no limit on the lifetime of the name conversion.
  • Attempting a simpler replacement for uri && uri[0] I noticed strange repetitions in other cases, things similar to uri && uri[0] && uri[0]. To add it as a line definitions it needs to be done more carefully. Attempting to add the definition inside the same line I needed to add wrapping parenthesis but this was causing subLoopFunctions code to get stuck.

To fix those it seems better to try to find a better approach than the current full line replacement, instead of creating more workarounds.

safetycar avatar Jul 23 '23 09:07 safetycar

partially completed in 69269b3edfdd482e1fe96547f694c25893d9344f

mmikeww avatar Jul 24 '23 16:07 mmikeww

Unrelated variables and parameter names elsewhere in the script (perhaps hundreds of lines away?) are also transformed.

Input:

a(s, r) {
    RegExMatch(s, r, m)
    return m
}
b(m) {
    return m
}

Incorrect output:

a(s, r) {
    RegExMatch(s, r, &m)
    return m[0]
}
b(m[0]) {
    return m[0]
}

This will be corrected within Masking pull request coming soon (already tested and corrected).

andymbody avatar Jun 22 '24 02:06 andymbody

I think PR #208 will have some affect on this issue. It is still not the full fix, but progress. More work will be needed to address the global scope of the issue outlined in this thread. Possibly a rewrite of how RegexMatch (and other structures are) is handled in general, rather than multiple band-aids (as @safetycar mentioned). The upcoming PR that will introduce class/function/string masking (and separate handling) may come in handy for this as well. It can be extended to include other structures such as IF, Regex, etc, which may provide alternate fronts of attack.

See commit #213 for partial solution to this thread

andymbody avatar Jun 25 '24 02:06 andymbody