kotlinx.serialization Add JsonDecoder.peekToken() ?

trafficstars

Some backends return polymorphic data without a descriptor:

{
  "value": "someValue"
}

vs

{
  "value": {
    "version": 0,
    "content": "someValue"
  }
}

I know this is quite the edge case but I haven't found a way to decode this without going through JsonElement and buffering everything.

Would it be possible to introduce jsonDecoder.peekToken():

enum class JsonToken {
  BEGIN_OBJECT,
  BEGIN_ARRAY,
  STRING
  // others?
}

public interface JsonDecoder : Decoder, CompositeDecoder {
  /**
   * peeks into the json and returns the next token (without consuming it)
   */
  fun peekToken(): JsonToken
}

This way, users that know they are in a JSON context could do stuff like this:

  override fun deserialize(decoder: Decoder): Schema {
    // unsafe cast, the user needs to assume JSON but in some cases it's doable
    decoder as JsonDecoder
    return when (decoder.peekToken()) {
      BEGIN_OJBECT -> decoder.decodeStructure(/*...*/)
      STRING -> decoder.decodeString()
      else -> error("unexpected token")
    }
  }

Mar 09 '23 10:03 martinbonnin

Would love to see a JsonDecoder#discardToken() feature as well, so we could easily create a collection serializer which discards illegal entries for instance.

May 11 '23 20:05 Kantis

Hi @martinbonnin, I have exactly the same case as you, could you please share how did you manage to decode it with the current version of kotlinx.serialization? Thanks ;)

Feb 13 '24 20:02 venator85

@venator85 sorry for the late reply, I don't have all the context anymore but I probably ended up decoding everything to a JsonElement

Mar 26 '24 21:03 martinbonnin

+1 on the solution @martinbonnin mentioned. It allows me to consume a section from the Json, and manually parse or "peek".

Mar 27 '24 14:03 nomisRev

This would be amazing! I have a JSON document that needs to be parsed. But depending on the first token, whether that is an object or an array, it needs to choose between two different serializers. Currently I have to parse the entire document using decodeJsonElement() just to see the first token, which is obviously not that fast.

May 22 '24 01:05 Thomas-Vos

Just found out there is actually an internal API for this, I guess this will do for now.

@Suppress("INVISIBLE_REFERENCE", "INVISIBLE_MEMBER")
run {
    decoder as kotlinx.serialization.json.internal.StreamingJsonDecoder
    when (decoder.lexer.peekNextToken()) {
        kotlinx.serialization.json.internal.TC_BEGIN_OBJ -> {
        }
        else -> {}
    }
}

May 22 '24 01:05 Thomas-Vos

The ability to parse Json or other formats token by token would be huge for low memory devices. I have some large files that don't fit entirely into memory. This would make parsing them possible.

May 23 '24 15:05 hansenji

@hansenji The problem is how you are going to deal with partially decoded values. It almost only make sense for collections, but at what level of the hierarchy? (top only?). The best approach I see is to use custom serializers for this (possibly using direct format access through casting).

May 24 '24 08:05 pdvrieze

Note that StreamingJsonDecoder is not the only implementation, so it is not guaranteed to work in all cases. Particularly, if you have polymorphic value decoded, a decoder would already have a JsonElement without any notion of tokens.

May 27 '24 14:05 sandwwraith

@pdvrieze Ideally the solution would be a complete peek, nextName, nextValue, skip, beginObject/Array, endObject/Array. With these I can manually parse the file. Ideally with this then you could pass a reader to a generated serializer and read just that value appropriately and not close the stream allowing partial parts to be read in accordingly.

Jun 05 '24 22:06 hansenji

kotlinx.serialization kotlinx.serialization copied to clipboard

Add JsonDecoder.peekToken() ?

kotlinx.serialization
kotlinx.serialization copied to clipboard