kotlinx.serialization
kotlinx.serialization copied to clipboard
Add JsonDecoder.peekToken() ?
Some backends return polymorphic data without a descriptor:
{
"value": "someValue"
}
vs
{
"value": {
"version": 0,
"content": "someValue"
}
}
I know this is quite the edge case but I haven't found a way to decode this without going through JsonElement and buffering everything.
Would it be possible to introduce jsonDecoder.peekToken():
enum class JsonToken {
BEGIN_OBJECT,
BEGIN_ARRAY,
STRING
// others?
}
public interface JsonDecoder : Decoder, CompositeDecoder {
/**
* peeks into the json and returns the next token (without consuming it)
*/
fun peekToken(): JsonToken
}
This way, users that know they are in a JSON context could do stuff like this:
override fun deserialize(decoder: Decoder): Schema {
// unsafe cast, the user needs to assume JSON but in some cases it's doable
decoder as JsonDecoder
return when (decoder.peekToken()) {
BEGIN_OJBECT -> decoder.decodeStructure(/*...*/)
STRING -> decoder.decodeString()
else -> error("unexpected token")
}
}
Would love to see a JsonDecoder#discardToken() feature as well, so we could easily create a collection serializer which discards illegal entries for instance.
Hi @martinbonnin, I have exactly the same case as you, could you please share how did you manage to decode it with the current version of kotlinx.serialization? Thanks ;)
@venator85 sorry for the late reply, I don't have all the context anymore but I probably ended up decoding everything to a JsonElement
+1 on the solution @martinbonnin mentioned. It allows me to consume a section from the Json, and manually parse or "peek".
This would be amazing! I have a JSON document that needs to be parsed. But depending on the first token, whether that is an object or an array, it needs to choose between two different serializers. Currently I have to parse the entire document using decodeJsonElement() just to see the first token, which is obviously not that fast.
Just found out there is actually an internal API for this, I guess this will do for now.
@Suppress("INVISIBLE_REFERENCE", "INVISIBLE_MEMBER")
run {
decoder as kotlinx.serialization.json.internal.StreamingJsonDecoder
when (decoder.lexer.peekNextToken()) {
kotlinx.serialization.json.internal.TC_BEGIN_OBJ -> {
}
else -> {}
}
}
The ability to parse Json or other formats token by token would be huge for low memory devices. I have some large files that don't fit entirely into memory. This would make parsing them possible.
@hansenji The problem is how you are going to deal with partially decoded values. It almost only make sense for collections, but at what level of the hierarchy? (top only?). The best approach I see is to use custom serializers for this (possibly using direct format access through casting).
Note that StreamingJsonDecoder is not the only implementation, so it is not guaranteed to work in all cases. Particularly, if you have polymorphic value decoded, a decoder would already have a JsonElement without any notion of tokens.
@pdvrieze Ideally the solution would be a complete peek, nextName, nextValue, skip, beginObject/Array, endObject/Array. With these I can manually parse the file.
Ideally with this then you could pass a reader to a generated serializer and read just that value appropriately and not close the stream allowing partial parts to be read in accordingly.