Molten icon indicating copy to clipboard operation
Molten copied to clipboard

Better testing of strings

Open LeopoldArkham opened this issue 6 years ago • 1 comments

tests/strings.toml should have more cases added:

  • Non ASCII characters
  • Trying harder to break multiline basic strings
  • Making sure escaped characters work correctly

Basically it should test everything the TOML standard says they're for

LeopoldArkham avatar Oct 19 '17 14:10 LeopoldArkham

Here are some specific issues I've noticed by working with the code and reading the spec. Even if people are aware of them, I think it's worth listing them here. (It might be worth making some new issue posts about them, but I decided to put them here because they have important information for making the new tests.)

  1. Escaped characters are currently unimplemented. There's a TODO comment in Parser::parse_string about that.

  2. Literal strings currently accept any and all UTF-8 content. That's almost correct, but I just noticed this sentence in the spec:

    Control characters other than tab are not permitted in a literal string.

    With that in mind, we should be checking for control characters and returning an error if one is encountered--as long as it's not '\t' or, for a multiline literal string, '\r' or '\n'. The extra test cases should, of course, cover those possibilities. Elsewhere, the spec lists the control characters as U+0000 to U+001F and U+007F (i.e. only the ones defined by ASCII).

    On a related note, invalid Unicode code points shouldn't be accepted either. I'm pretty sure Rust handles this for us, but we should try an undefined code point (albeit with what the correct UTF-8 encoding for it would be if it existed), just to make sure. (A fuzzer would probably be really good for finding bugs like this.) An invalid code point is anything from U+D800 to U+DFFF or from U+110000 up.

  3. I'm pretty sure there's no handling yet for ending lines in basic strings with backslashes. As noted in the spec, if the last thing before a line's trailing whitespace is a backslash, the whole segment from the backslash to the newline is trimmed and the string continues. The spec doesn't say this implicitly, but I think it's only supposed to apply to a lone backslash--so the escaped form, \\, shouldn't cause that behavior. Otherwise, we might see this:

    string = "foo\\
    tbar" # Evaluates as "foo\tbar"
    

    On the other hand, Github's markdown seems okay with it. Also, I don't think it applies to literal strings, since it's a form of escaping, but that might be wrong too. Maybe some of the pre-made test suites referenced in #19 can help us decide these two points.

jeremydavis519 avatar Nov 25 '17 08:11 jeremydavis519