jsoniter-scala icon indicating copy to clipboard operation
jsoniter-scala copied to clipboard

Consider separation of decoding and encoding traits

Open plokhotnyuk opened this issue 7 years ago • 26 comments

Sometime only serialization or parsing is required.

As w/a use ??? operator for non required encoding or decoding methods in custom codecs.

plokhotnyuk avatar May 04 '18 09:05 plokhotnyuk

@plokhotnyuk I'm running into this issue too. But it's bigger than just using ??? for implementing codecs. E.g. I have certain fields I don't want to emit into the JSON. So the codec configs probably needs to be separate too.

nilskp avatar Sep 13 '18 20:09 nilskp

Do you want parse them but their values should not be serialized back to JSON?

plokhotnyuk avatar Sep 13 '18 20:09 plokhotnyuk

Sometimes that's the case, yes. Other times the opposite.

nilskp avatar Sep 14 '18 13:09 nilskp

Why not just use different case classes or codecs for that?

Main idea behind generation of paired encoder and decoder is that with them you can make sure that sequential serialization and parsing do not loose any information.

If your case do not share this value than creation of a custom codec or an alternative macro which can generate such cases can be options...

Also, I'm thinking about how make the make macro to be open for extensions....

plokhotnyuk avatar Sep 14 '18 13:09 plokhotnyuk

Why not just use different case classes or codecs for that?

Sure, more code can solve almost anything. But it's boilerplate copying of near identical structures, just really bothersome. Right now I have around 50 sealed case classes I would have to duplicate, for just one response.

Main idea behind generation of paired encoder and decoder is that with them you can make sure that sequential serialization and parsing do not loose any information.

Agreed, this is essential for two-way codecs.

nilskp avatar Sep 14 '18 14:09 nilskp

Right now, I need to output an ADT content as just a struct, e.g.

{
  "foo": 5,
  "bar": "BAR"
}

However, that doesn't seem possible, because the Codec is two-way, thus must include some type information. I'm left with no choice other than reflectively convert to a Map[String, Any] and manually supply the needed codecs.

nilskp avatar Sep 27 '18 14:09 nilskp

@nilskp please describe your initial problem: show an ADT that you have and provide an example of JSON representation that need to be archived for it.

plokhotnyuk avatar Sep 27 '18 14:09 plokhotnyuk

E.g.

sealed abstract class Foo
case class Bar(a: Int, b: Float) extends Foo
case class Baz(x: String) extends Foo

So, given a Seq[Foo], I'd like to end up with this (one-way) grouped representation:

{
  "bars": [
    {"a": 5, "b": 54.23},
    {"a": 98, "b": 7.3},
  ],
  "bazs": [
    {"x":" Hello"},
    {"x": "World"}
  ]
}

nilskp avatar Sep 27 '18 14:09 nilskp

If you need the most efficient implementation then generate sources for the leaf case classes and manually create a custom codec for sequence like this:

import com.github.plokhotnyuk.jsoniter_scala.core._
import com.github.plokhotnyuk.jsoniter_scala.macros._

sealed abstract class Foo
case class Bar(a: Int, b: Float) extends Foo
case class Baz(x: String) extends Foo

abstract class JsonValueEncoder[A] extends JsonValueCodec[A] {
  def nullValue: A = null.asInstanceOf[A]

  def decodeValue(in: JsonReader, default: A): A = ???
}

object Examples {
  def main(args: Array[String]): Unit = {
    implicit val barCodec: JsonValueCodec[Bar] = JsonCodecMaker.make(CodecMakerConfig())
    implicit val bazCodec: JsonValueCodec[Baz] = JsonCodecMaker.make(CodecMakerConfig())
    implicit val codec: JsonValueCodec[Seq[Foo]] = new JsonValueEncoder[Seq[Foo]] {
      def encodeValue(xs: Seq[Foo], out: JsonWriter): Unit = {
        out.writeObjectStart()
        encodeSeqField[Bar]("bars", xs, out)
        encodeSeqField[Baz]("bazs", xs, out)
        out.writeObjectEnd()
      }
    }

    def encodeSeqField[A](name: String, as: Seq[Any], out: JsonWriter)(implicit codecA: JsonValueCodec[A]): Unit = {
      out.writeNonEscapedAsciiKey(name)
      out.writeArrayStart()
      as.foreach {
        case a: A =>
          out.writeComma()
          codecA.encodeValue(a, out)
        case _ =>
      }
      out.writeArrayEnd()
    }

    println(new String(writeToArray(Seq[Foo](Bar(5, 54.23f), Baz("Hello"), Bar(98, 7.3f), Baz("World")),
      WriterConfig(indentionStep = 2)), "UTF-8"))
  }
}

And output:

{
  "bars": [
    {
      "a": 5,
      "b": 54.23
    },
    {
      "a": 98,
      "b": 7.3
    }
  ],
  "bazs": [
    {
      "x": "Hello"
    },
    {
      "x": "World"
    }
  ]
}

plokhotnyuk avatar Sep 27 '18 16:09 plokhotnyuk

Right, I already have a workaround. But should all that code be necessary? I have a sealed trait with over 30 subclasses that I haven't yet gotten to. It should be a simple configuration for one-way use cases, i.e. separation of encoding and decoding.

nilskp avatar Sep 27 '18 16:09 nilskp

@nilskp I have generalized it, the last version above should be +2 lines per leaf class now.

plokhotnyuk avatar Sep 27 '18 16:09 plokhotnyuk

Won't this lead to [,?

        out.writeArrayStart()
        xs.foreach {
          case bar: Bar =>
            out.writeComma()
            barCodec.encodeValue(bar, out)
          case _ =>
        }

nilskp avatar Sep 27 '18 17:09 nilskp

Shouldn't this be possible with a config option, where the decodeValue method simply ends up ??? (Nothing)?

I.e. CodecMakerConfig(discriminatorType: DiscriminatorType), replacing discriminatorFieldName?

sealed abstract class DiscriminatorType
case class Field(name: String = "type") extends DiscriminatorType
case object Key extends DiscriminatorType
case object Bare extends DiscriminatorType

nilskp avatar Sep 27 '18 17:09 nilskp

@nilskp

  1. writeComma() is a smart routine which works together with writeArrayStart(), so it will not emit unwanted , when used properly.
  2. to close this issue you can create and use the following abstract class (I have updated an example snippet above to show its usage):
abstract class JsonValueEncoder[A] extends JsonValueCodec[A] {
  def nullValue: A = null.asInstanceOf[A]

  def decodeValue(in: JsonReader, default: A): A = ???
}
  1. Your feedback is valuable and I'm happy to help you, but please create a separate issue for code generation of your case with sum types that need to be grouped by class.

plokhotnyuk avatar Sep 28 '18 06:09 plokhotnyuk

I know I'm pretty late to the party. I have the same issue with split encoders/decoders. It would be nice if JsonValueCodec[A] would extend a JsonValueEncoder[A] and a JsonValueDecoder[A] because I'd like to document the fact in code that there exists only a decoder or an encoder for a certain type. Could that be a solution?

martinpallmann avatar Jul 15 '20 09:07 martinpallmann

@martinpallmann Hi, Martin! Unfortunately, no any solution yet... Initially it looks like a pain, but it pays greatly if for all custom codecs you provide both encoding and decoding implementations and test them with round trip conversions.

plokhotnyuk avatar Jul 15 '20 11:07 plokhotnyuk

@plokhotnyuk yes. And I'm not challenging that. I would just like to have for some cases the possibility to have only one of them where I may have already a different test strategy. The reason is this PR for caliban: https://github.com/ghostdogpr/caliban/pull/471. There are no decoders needed for some of the values and they are also very difficult if not impossible to implement.

martinpallmann avatar Jul 15 '20 11:07 martinpallmann

@plokhotnyuk Wow!

Will just having abstract classes or traits with ??? as an implementation for redundant methods be acceptable for your case? It just an internal API, isn't it?

plokhotnyuk avatar Jul 15 '20 12:07 plokhotnyuk

I mean it's probably acceptable but the skin on my back crawls up when I do that. But if you're not willing to change the API of jsoniter then I have to do it. I'm not happy about it but life will go on. :)

martinpallmann avatar Jul 15 '20 13:07 martinpallmann

This is an old issue, but as far as I can tell, the status quo is still the same?

For me the main problem is that the generated codecs are non-trivial in terms of generated code size. This matters a lot in Scala.js., because all that code needs to be downloaded and parsed, even if it's not executed. That takes time, especially on mobile devices, as our JS code size goes further and further into megabytes.

If it's not possible to derive just a Decoder, it means that for every JSON API response type that we decode on the frontend, we also have to include an Encoder in the generated JS, which will never be used, because we only ever need to encode this JSON on the backend, not on the frontend. And similarly for every JSON API request type, we will need to include a decoder in the JS bundle, even though that decoder will only be used on the backend.

In other JSON libraries like upickle, I can derive encoders and decoders separately, and put them in two separate lazy vals, and even though the code for both is generated, their definitions are entirely separate, and so encoders and decoders that are never used / referenced in frontend code will be dead-code-eliminated by Scala.js. However, if I understand correctly, it is not possible to do this with jsoniter-scala?

This is a bummer because for cases when both encoding and decoding is needed, jsoniter generally generates more compact codecs than circe or upickle, although my testing on that has been very limited so far. I am actually not sure how refactoring jsoniter to separate the encoding and decoding functionality would affect the code size of the derived codecs for those cases when you need both an encoder and a decoder.

Earlier in this thread you've asked if using ??? as no-ops would suffice. I'm not sure what you meant, because I don't know how to derive such a codec for e.g. a case class or a GADT without having to write it entirely manually. But if jsoniter had makeEncoder / makeDecoder functions that generated an encode-only or decode-only codec with ??? as the no-op implementation, that would be fine by me. I don't know anything about how macros work, so I'm not sure how hard it is to avoid generating some code like this.

raquo avatar Oct 05 '23 08:10 raquo

@raquo Thanks for you feedback!

I can add configuration for make to do not generate implementation for encoder or decoder.

Also, I'm going to update documentation to point that for similar case as yours users can save on generated code by using codec generation for intermediate data structures (like a codec per class defiled in the companion object).

plokhotnyuk avatar Oct 05 '23 09:10 plokhotnyuk

That would be great, I think, thanks! I just want to underscore that I don't know what the technical solution should be, because I don't understand macros at all, I just wanted to convey the need. As far as I can tell, what you said sounds like it should help.

I do know how to check for bundle size difference using source-map-explorer npm package though, although it's a manual process, I've never tried automating it.

And yes, such a doc as you mentioned would be great 👍

raquo avatar Oct 05 '23 10:10 raquo

@raquo I've tried the commit above with jsoniter-scala-benchmarkJS module and it allows to save ~100K using withEncodingOnly(true) option in resulting fully optimized *.js file for more than 100 codecs that are mostly small but some of them like for OpenRTBBenchmark and TwitterAPIBenchmark are quite big.

plokhotnyuk avatar Oct 06 '23 06:10 plokhotnyuk

Nice, thank you! That's a good shave.

May I suggest, instead of a literal ???, to throw an exception with a message like: jsoniter-scala: codec encoder is missing because decodingOnly = true. I'm assuming that no info about the codec (name / type / etc.) is available at this point in the code, but if I'm wrong, including that would help debugging too.

The problem with a literal ??? is that people don't expect it from libraries, so they will be confused when they will go looking for it in their own code.

raquo avatar Oct 06 '23 08:10 raquo

@raquo Thanks for the review!

The goal of these options is to minimize code of generated codecs and I hope they will be used wisely only in rare use cases as yours.

Please peek the latest release, it is already available on the Maven Central: https://github.com/plokhotnyuk/jsoniter-scala/releases/tag/v2.24.0

plokhotnyuk avatar Oct 06 '23 08:10 plokhotnyuk

It would be nice if encoder and decoder are separate at the type-level. The workaround for custom codecs is unsafe where the compiler cannot detect that a decode-only codec is being used in encoding scenarios.

quanganhtran avatar Jan 23 '24 10:01 quanganhtran