fs2 icon indicating copy to clipboard operation
fs2 copied to clipboard

decodeCWithCharset error handling inconsistency

Open rossabaker opened this issue 4 years ago • 2 comments
trafficstars

utf8.decodeC is a specialization of the generic private decoder used for other charsets:

  def decodeCWithCharset[F[_]: RaiseThrowable](charset: Charset): Pipe[F, Chunk[Byte], String] =
    if (charset.name() == StandardCharsets.UTF_8.name())
      utf8.decodeC
    else
      decodeCWithGenericCharset(charset)

On bad input, the UTF8 decoder succeeds with the Unicode replacement character, and the other charsets raise an error:

scala> Stream(0xc0, 0xaf).map(_.toByte).covary[Fallible].chunks.through(text.utf8.decodeC).compile.string
val res14: scala.util.Either[Throwable,String] = Right(��)

scala> Stream(0xc0, 0xaf).map(_.toByte).covary[Fallible].chunks.through(text.decodeCWithGenericCharset(UTF_8)).compile.string
val res15: scala.util.Either[Throwable,String] = Left(java.nio.charset.MalformedInputException: Input length = 1)

There's no visible UTF-8 encoder that raises errors, nor is there an generic encoder that uses replacement characters. We might want to make the behavior configurable, but it should at least be consistent.

rossabaker avatar Oct 07 '21 19:10 rossabaker

Should we release a 3.1.5 with #2671 and then address this later?

mpilquist avatar Oct 07 '21 23:10 mpilquist

Yes, I think so. It only affects people using aging charsets on an aging JDK with a relatively recent FS2 feature. But for those few, it's a fatal error.

rossabaker avatar Oct 08 '21 00:10 rossabaker