geom wkt_decode: readWhitespace does not skip all white space.

readWhitespace assumes all text is ASCII. The function should decode runes correctly and then check to see if they are spaces.

Can not test runes this way:

https://github.com/go-spatial/geom/blob/master/encoding/wkt/wkt_decode.go#L55-L57

You need to read all the bytes in the run before testing it.

Here is a test case with the failures:


func TestReadWhitespace(t *testing.T) {
	type tcase struct {
		in    string
		match bool
	}
	fn := func(tc tcase) func(*testing.T) {
		decoder := NewDecoder(strings.NewReader(tc.in + "."))

		return func(t *testing.T) {
			match, err := decoder.readWhitespace()
			if err != nil {
				t.Errorf("error, expected nil got: %v", err)
				return
			}
			if match != tc.match {
				t.Errorf("match, expected %t, got %t", tc.match, match)
			}
		}
	}

	tests := map[string]tcase{
		"regular0":     tcase{in: " ", match: true},
		"tab":          tcase{in: "\t", match: true},
		"newline":      tcase{in: "\n", match: true},
		"return":       tcase{in: "\r", match: true},
		"regular":      tcase{in: "\u0020", match: true},
		"nobreak":      tcase{in: "\u00A0", match: true},
		"ogham mark":   tcase{in: "\u1680", match: true},
		"en quad":      tcase{in: "\u2000", match: true},
		"em quad":      tcase{in: "\u2001", match: true},
		"en":           tcase{in: "\u2002", match: true},
		"em":           tcase{in: "\u2003", match: true},
		"three-per-em": tcase{in: "\u2004", match: true},
		"four-per-em":  tcase{in: "\u2005", match: true},
		"six-per-em":   tcase{in: "\u2006", match: true},
		"figure":       tcase{in: "\u2007", match: true},
		"punctuation":  tcase{in: "\u2007", match: true},
		"thin":         tcase{in: "\u2009", match: true},
		"hair":         tcase{in: "\u200A", match: true},
		"nnbsp":        tcase{in: "\u202F", match: true},
		"mmsp":         tcase{in: "\u205F", match: true},
		"ideographic":  tcase{in: "\u3000", match: true},
	}
	for key, test := range tests {
		t.Run(key, fn(test))
	}

}

Nov 01 '19 23:11 gdey

Do you have a WKT string test case for this?

As far as I've read, the spec does not specify any unicode characters, which is why the Decoder can/does read bytes and not runes.

I could add a check to ensure the characters are within a valid ascii range in readByte

Nov 02 '19 21:11 ear7h

We are reading in UTF8 text, the spec does not say that it must be ASCII; so we should not assume ASCII only spaces. The other characters are unlikely to have an issue with UTF8 as they are equivalent, so I did not call it out -- though the errors can be funky.

Nov 04 '19 17:11 gdey

geom geom copied to clipboard

wkt_decode: readWhitespace does not skip all white space.

geom
geom copied to clipboard