fs2
fs2 copied to clipboard
decodeCWithCharset error handling inconsistency
utf8.decodeC is a specialization of the generic private decoder used for other charsets:
def decodeCWithCharset[F[_]: RaiseThrowable](charset: Charset): Pipe[F, Chunk[Byte], String] =
if (charset.name() == StandardCharsets.UTF_8.name())
utf8.decodeC
else
decodeCWithGenericCharset(charset)
On bad input, the UTF8 decoder succeeds with the Unicode replacement character, and the other charsets raise an error:
scala> Stream(0xc0, 0xaf).map(_.toByte).covary[Fallible].chunks.through(text.utf8.decodeC).compile.string
val res14: scala.util.Either[Throwable,String] = Right(��)
scala> Stream(0xc0, 0xaf).map(_.toByte).covary[Fallible].chunks.through(text.decodeCWithGenericCharset(UTF_8)).compile.string
val res15: scala.util.Either[Throwable,String] = Left(java.nio.charset.MalformedInputException: Input length = 1)
There's no visible UTF-8 encoder that raises errors, nor is there an generic encoder that uses replacement characters. We might want to make the behavior configurable, but it should at least be consistent.
Should we release a 3.1.5 with #2671 and then address this later?
Yes, I think so. It only affects people using aging charsets on an aging JDK with a relatively recent FS2 feature. But for those few, it's a fatal error.