WhirlyGlobe
WhirlyGlobe copied to clipboard
Invalid UTF-8 in tile data causes crash
Upon rendering "Mount Nebo" in the neighborhood of St. Louis on the MapTiler topo map:
JNI DETECTED ERROR IN APPLICATION: input is not valid Modified UTF-8: illegal start byte 0xb2
java_vm_ext.cc:578] string: 'Mount Nebo
java_vm_ext.cc:578] 250 m
java_vm_ext.cc:578] �'
java_vm_ext.cc:578] input: '0x4d 0x6f 0x75 0x6e 0x74 0x20 0x4e 0x65 0x62 0x6f 0x20 0x0a 0x32 0x35 0x30 0x20 0x6d 0x0a <0xb2>'
java_vm_ext.cc:578] in call to NewStringUTF
java_vm_ext.cc:578] from boolean com.mousebird.maply.MapboxVectorTileParser.parseData(byte[], com.mousebird.maply.VectorTileData, com.mousebird.maply.LoaderReturn)
In UTF-8, bytes greater than 0x7f indicate at least one additional byte in the codepoint, so a string ending in 0xb2 is invalid.
I found reference to a known issue relating to 4-byte codepoints but which was fixed in API 23, this was on 30, targeting 28.
We should be validating UTF-8 on input but, until then, it would probably be enough to pad out the memory allocations for strings with a few extra zeros so that, if the JVM UTF-8 processor walks off the end of a string it's guaranteed to find a terminating zero before invalid memory.
Also observed near Kansas City:
java_vm_ext.cc:578] string: 'Skunk Hill
java_vm_ext.cc:578] 412 m
java_vm_ext.cc:578] �'
java_vm_ext.cc:578] input: '0x53 0x6b 0x75 0x6e 0x6b 0x20 0x48 0x69 0x6c 0x6c 0x20 0x0a 0x34 0x31 0x32 0x20 0x6d 0x0a <0xb2>'
That's a wild one. Good catch.
See also #1262