rewrite
rewrite copied to clipboard
YamlParsing failure if various unicode characters exist in the source file.
Example:
root:
- value1: 🛠
value2: check
Exception:
com.fasterxml.jackson.databind.JsonMappingException: Invalid surrogate pair, starts with invalid high surrogate (0xDEE0), not in valid range [0xD800, 0xDBFF] (through reference chain: java.util.ArrayList[6]->org.openrewrite.yaml.tree.Yaml$Documents["documents"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Document["block"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["value"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[1]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["value"]->org.openrewrite.yaml.tree.Yaml$Sequence["entries"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Sequence$Entry["block"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[1]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["prefix"])
> Invalid surrogate pair, starts with invalid high surrogate (0xDEE0), not in valid range [0xD800, 0xDBFF] (through reference chain: java.util.ArrayList[6]->org.openrewrite.yaml.tree.Yaml$Documents["documents"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Document["block"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["value"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[1]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["value"]->org.openrewrite.yaml.tree.Yaml$Sequence["entries"]->java.util.ArrayList[0]->org.openrewrite.yaml.tree.Yaml$Sequence$Entry["block"]->org.openrewrite.yaml.tree.Yaml$Mapping["entries"]->java.util.ArrayList[1]->org.openrewrite.yaml.tree.Yaml$Mapping$Entry["prefix"])
> Invalid surrogate pair, starts with invalid high surrogate (0xDEE0), not in valid range [0xD800, 0xDBFF]
Based on the stacktrace the encoding may be unsupported: Character encoding for 🛠.
- UTF-8 Encoding: | 0xF0 0x9F 0x9B 0xA0
- UTF-16 Encoding: | 0xD83D 0xDEE0
- UTF-32 Encoding: | 0x0001F6E0
- The parsing issue prevents ingesting
micronaut
projects. - WINDOWS-1252 and ISO-8859-1 are not supported in YAML:
- The
InputStreamReader
does not pass in a StandarCharset and is always defaulted to UTF-8. - The String returned by the
ByteArrayInputStream
is always set to UTF-8.
- The
Now skipping Yaml files with unicode characters as of https://github.com/openrewrite/rewrite/pull/3427
Exploring options for a fix in: https://github.com/openrewrite/rewrite/pull/3421