lci icon indicating copy to clipboard operation
lci copied to clipboard

String and colon

Open Lucki opened this issue 10 years ago • 17 comments

I have a string, which sometimes comes with a leading :. So I want to check for this, but right now it's not working. Here's my testing code. It doesn't matter if I get the String with GIMMEH or a file.

HAI 1.3
    CAN HAS STRING?

    I HAS A string ITZ "::hello"

    I HAS A char ITZ I IZ STRING'Z AT YR string AN YR 0 MKAY
    VISIBLE char AN " - " AN "::"

    BOTH SAEM char AN "::", O RLY?
    YA RLY
        VISIBLE "success"
    NO WAI
        VISIBLE "fail"
    OIC

    VISIBLE char AN " - " AN ":(3A)"
    BOTH SAEM char AN ":(3A)", O RLY?
    YA RLY
        VISIBLE "success"
    NO WAI
        VISIBLE "fail"
    OIC

    BTW gives error: VISIBLE ":[char]" AN " - " AN "COLON"
    BOTH SAEM ":[char]" AN "COLON", O RLY?
    YA RLY
        VISIBLE "success"
    NO WAI
        VISIBLE "fail"
    OIC

    I HAS A char2 ITZ I IZ STRING'Z AT YR string AN YR 1 MKAY
    VISIBLE char2 AN " - " AN "h"
    BOTH SAEM char2 AN "h", O RLY?
    YA RLY
        VISIBLE "success"
    NO WAI
        VISIBLE "fail"
    OIC

    I HAS A char3 ITZ "::"
    VISIBLE char3 AN " - " AN "::"
    BOTH SAEM char3 AN "::", O RLY?
    YA RLY
        VISIBLE "success"
    NO WAI
        VISIBLE "fail"
    OIC
KTHXBYE

Any idea?

Lucki avatar Feb 17 '15 22:02 Lucki

Thanks for mentioning this. I think the problem for the first and second comparison is that the second string is not getting interpolated before being compared. I added some debug output to the string comparison function to look at exactly what was getting compared, and this is what I found:

: - :
comparing ':' and '::'
fail
: - :
comparing ':' and ':(3A)'
fail
comparing ':[char]' and 'COLON'
fail
h - h
comparing 'h' and 'h'
success
: - :
comparing '::' and '::'
success

(For the third case, :[char] is attempting to use "char" as a unicode normative name and that form is used to go from normative name --> character but there is not a way right now to go the other way -- from character --> normative name.)

The specification does not state whether string values being compared should be interpolated or not. Right now, strcmp is used to compare two uninterpolated strings.

This actually poses an interesting issue because if interpolation is never performed on string comparison (as is currently the case), then it's possible to compare an interpolated string ':' to an uninterpolated string '::' and get incorrect results. However, if interpolation is always performed on string comparison, then an interpolated string (':') would cause an error because a single colon is invalid syntax.

To solve this, the first thing that comes to my mind is a bit hacky. It would be to modify the ValueObject structure to include another field that denotes whether a string contained in the ValueObject has been interpolated or not. Then, make the string interpolation code ignore strings that have already been interpolated and always interpolate strings when performing a comparison.

I'll need to think this through before its implemented, but thanks for raising the issue.

justinmeza avatar Feb 18 '15 15:02 justinmeza

Take your time. I don't work on something important, just noticed this strange behavior.

Lucki avatar Feb 18 '15 22:02 Lucki

I just noticed that this happens with every escape-char. Smileys like :( or :[ let the program crash because the closing bracket isn't available. If it is available, the program trys to get a hex out of the string.

How about converting the : to a :: in every string we get from a FILO? With this we can be sure, the string is as intended and we can compare it inside the code.

Lucki avatar Feb 19 '15 13:02 Lucki

That could be one potential solution. Basically what you're suggesting is internally representing escape characters as how the user would type them instead of their ASCII code.

One issue could be that a single character (like :) can expand to multiple representations under the current spec, as you've pointed out in your code (::, :(3A), :[COLON]) so we would need to specify this behavior. It seems reasonable to expand such characters to their simple form ::, :), and so on.

Assuming there aren't any odd corner cases, this approach seems relatively clean and will probaly be the first thing I try. Thanks for the suggestion!

justinmeza avatar Feb 19 '15 16:02 justinmeza

How's it going? Is there some progress?

Lucki avatar Apr 04 '15 12:04 Lucki

Thanks for reminding me of this, I have some more time now so I will see if I can whip up a diff for this.

justinmeza avatar Apr 12 '15 23:04 justinmeza

Anything new on this?

Lucki avatar Jul 17 '15 14:07 Lucki

So! After looking at the code, the solution I ended up going with is a bit simpler than what we discussed. I think at some point, strings were interpolated whenever they were evaluated but I think I did some performance testing and found that a lot of time was spent interpolating strings that were just getting passed around (and not being compared). I put a note about this in the code a long time ago: https://github.com/justinmeza/lci/blob/master/interpreter.c#L1871-L1875 .

Anyway, the solution I went for is to check in interpretEqualityOpExprNode() to see if the values being compared are strings, and if so, to interpret them using castStringExplicit() before comparing them.

Note that BOTH SAEM ":[char]" AN "COLON", O RLY? will cause an error because the :[NAME] expansion only goes one way (according to the LOLCODE spec, this could always be changed): from a Unicode normative name to a character, but not from a character to a Unicode normative name.

Anyway, the output after the change (and commenting out the :[char] test) is:

: - :
success
: - :
success
h - h
success
: - :
success

I'm going to make a unit test for this and then commit the diff.

justinmeza avatar Jul 19 '15 17:07 justinmeza

I'm really sorry to get back on this, but I want to use this, where I'll get a string, which is :[ ] x [ ] y - this will still crash my program. Even after the latest commit.

Lucki avatar Jul 20 '15 21:07 Lucki

No worries! I completely forgot you mentioned that as well. I will check it out.

justinmeza avatar Jul 20 '15 21:07 justinmeza

Hey @Lucki, is there an example where using the colon escape :: does not work for what you want to do?

For example, to output the string you mention, I would do VISIBLE "::[ ] x [ ] y".

justinmeza avatar Jul 26 '15 19:07 justinmeza

I'll get it from a string outside the program. I'll get : in a string, which isn't represented as a :: inside my program.

Here a quick example:

HAI 1.4

    CAN HAS STRING?

    I HAS A string
    GIMMEH string
    VISIBLE string

KTHXBYE

Giving the string :[ ] x [ ] y will crash the program.

Lucki avatar Jul 26 '15 20:07 Lucki

Gotcha. Check out the latest commit on the future branch and let me know if there are any corner cases that it does not handle.

justinmeza avatar Jul 27 '15 04:07 justinmeza

Is there a difference if I get this string over a socket? It works now with GIMMEH, but GET still crashes with this string.

Lucki avatar Jul 27 '15 13:07 Lucki

OK, I added some code that sanitizes the inputs from outside sources (currently STDIO'Z LUK and SOCKS'Z GET). This should properly escape the colon character on input!

justinmeza avatar Jul 30 '15 04:07 justinmeza

Thank you, seems to work so far :+1:

Lucki avatar Aug 03 '15 14:08 Lucki

I'm really sad because I've found another use case with this problem:

HAI 1.4
    CAN HAS STRING?

    I HAS A string ITZ SMOOSH "::" AN "[" MKAY
    VISIBLE string
KTHXBYE

Expected closing square bracket after :[

Lucki avatar Sep 26 '15 12:09 Lucki