Default value whenever a value cannot be decoded
I'd like to provide some way to set a default value whenever some field cannot be decoded (because it does not have the right type or else). Here's an attempt to do this:
Could anyone look at this and tell me if I am doing it the right way?
final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] =
OrElseElement(decoder.decodeAsElement(c, localName, namespaceUri), _ => default(localName))
override def result(history: => List[String]): Either[DecodingError, A] =
Right(decoder.result(history).fold(e => default(history.headOption.getOrElse("root")), identity))
override def isCompleted: Boolean = decoder.isCompleted
}
Here's the same idea for attribute decoding:
final class OrElseAttribute[A, B <: A](decoder: AttributeDecoder[A], onError: (String, String) => B) extends AttributeDecoder[A] {
override def decodeAsAttribute(c: Cursor, localName: String, namespaceUri: Option[String]): Either[DecodingError, A] =
Right(decoder.decodeAsAttribute(c, localName, namespaceUri).fold(e => onError(localName, e.getMessage), identity))
}
Here's how this could be used:
enum Field[+A] { self =>
case Present[A](value: A) extends Field[A]
case Missing extends Field[Nothing]
case Unparsable(error: String) extends Field[Nothing]
given [A: ElementDecoder]ElementDecoder[Field[A]] = {
new OrElseElement(
ElementDecoder[Option[A]].emap { (_, oa) =>
Right(oa.fold(Field.Missing)(Field.Found(_)))
},
Field.UnParsable(_)
)
}
OrElseAttribute seems to be correct. Do you get any errors, when you use it?
It's more complicated with OrElseElement though. ElementDecoder.decodeAsElement reads StAX events from an input stream (it's c: Cursor) and accumulates state. When an error occurs, decodeAsElement returns a completed decoder. This decoder always returns the error, and it can't read any events. As far as I understand, in case of an error OrElseElement will finish decoding and return default value, but StAX input stream will have invalid state. It will contain events that would have been read if the error had not occurred. So OrElseElement should read events from input stream until it gets closing tag for the element that is being decoded.
This explanation may be complicated, so I'll try to provide a working example this weekend.
I also want to add, that current architecture of phobos doesn't allow easy customization of decoders. API of ElementDecoder is kind of ugly and difficult to use. I have some ideas, how it can be changed, but I am not sure, when will I be able to implement them
First, thank you for your help!
When you mention that OrElseElement should read the input stream until it gets a closing tag, are you referring to c.isEndElement? I guess then the idea is to do pretty much the same thing that with the stringEncoder:
if (c.isEndElement) {
ElementDecoder.errorIfWrongName[A](c, localName, namespaceUri).getOrElse {
c.next()
new ConstDecoder(default(localName))
}
} else {
new FailedDecoder(c.error(s"Unexpected event: '${c.getEventType}'"))
}
is that right?
Yes, it's somehow similar to the stringEncoder. In this case decoder have to find not just the first "end element" event, but the "end element" event, that belongs to the element being decoded. Consider the example below.
case class Bar(foo: Foo)
case class Foo(foo: Int) // Two elements have the same name to illustrate a corner case
given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")
Let's process the following document.
<bar><foo><foo>text</foo></foo></bar>
It cannot be parsed without OrElseElement, because the inner <foo> must be an integer. If we use the initial implementation of OrElseElement, an error will occur:
<bar><foo><foo>text</foo></foo></bar>
^^^^
Cursor will stop here. ElementDecoder[Bar] is expecting cursor to be on </bar>,
so ElementDecoder[Bar] will fail to parse the document
The correct implementation should move the cursor outside of the outer <foo> element:
<bar><foo><foo>text</foo></foo></bar>
^^^^^^
First move. Cursor encounters the first "end element" event.
The closing tag is named "foo", like the element being decoded, but it's an inner "foo".
Decoder is processing the outer "foo", so it must move on
<bar><foo><foo>text</foo></foo></bar>
^^^^^^
Second move. It's "end element" "foo" again.
But this time it's the end element the decoder is looking for.
The decoder must make one more move, so the cursor will be out of the outer "foo"
<bar><foo><foo>text</foo></foo></bar>
^^^^^^
Third move. The cursor is now out of outer "foo" and ElementDecoder[Bar] will work correctly
And here is the example:
final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] = {
// Memoize the "depth" of element being decoded,
// so decoder can distinguish nested elements with same name
val initialDepth = c.history.length
val updatedDecoder = decoder.decodeAsElement(c, localName, namespaceUri)
// Handle error if it occurs
if (updatedDecoder.isCompleted && updatedDecoder.result(c.history).isLeft) {
// Skip events until cursor is on closing tag of the element being decoded
while (!(
c.isEndElement &&
c.getLocalName == localName &&
Option(c.getNamespaceURI).filter(_.nonEmpty) == namespaceUri &&
c.history.length <= initialDepth)) {
c.next()
}
// Make one more step out of the decoded element
c.next()
ElementDecoder.ConstDecoder(default(localName))
} else {
OrElseElement(updatedDecoder, default)
}
}
override def result(history: => List[String]): Either[DecodingError, A] = decoder.result(history)
override def isCompleted: Boolean = decoder.isCompleted
}
It should handle most cases. Two notable exceptions I came up with: It does not handle lists correctly (it was too complex to implement fast), and It does not handle the AsyncXMLStreamReader.EVENT_INCOMPLETE (it's probably irrelevant, unless you use phobos-fs2 or other streaming addons)
There are some usage examples in scastie
Source of those usage examples
import ru.tinkoff.phobos.decoding._
import ru.tinkoff.phobos.encoding._
import ru.tinkoff.phobos.syntax._
import ru.tinkoff.phobos.derivation.semiauto._
object Main1 extends App {
final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] = {
// Memoize the "depth" of element being decoded,
// so decoder can distinguish nested elements with same name
val initialDepth = c.history.length
val updatedDecoder = decoder.decodeAsElement(c, localName, namespaceUri)
// Handle error if it occurs
if (updatedDecoder.isCompleted && updatedDecoder.result(c.history).isLeft) {
// Skip events until cursor is on closing tag of the element being decoded
while (!(
c.isEndElement &&
c.getLocalName == localName &&
Option(c.getNamespaceURI).filter(_.nonEmpty) == namespaceUri &&
c.history.length <= initialDepth)) {
c.next()
}
// Make one more step out of the decoded element
c.next()
ElementDecoder.ConstDecoder(default(localName))
} else {
OrElseElement(updatedDecoder, default)
}
}
override def result(history: => List[String]): Either[DecodingError, A] = decoder.result(history)
override def isCompleted: Boolean = decoder.isCompleted
}
// Example from the issue
def example1(): Unit = {
case class Bar(foo: Foo)
case class Foo(foo: Int)
given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")
val res1 = XmlDecoder[Bar].decode("<bar><foo><foo>text</foo></foo></bar>")
println(res1)
}
// More complex example
def example2(): Unit = {
case class Bar(a: Int, b: Int, c: Int) derives ElementDecoder
case class Baz(a: Int, b: Int, c: Int)
case class Qux(a: Int, b: Int, c: Int) derives ElementDecoder
given ElementDecoder[Baz] = OrElseElement(ElementDecoder.derived[Baz], _ => Baz(0, 0, 0))
case class Foo(bar: Bar, baz: Baz, qux: Qux) derives ElementDecoder
given XmlDecoder[Foo] = XmlDecoder.fromElementDecoder("foo")
val xml = """<foo>
| <bar><a>100</a><b>200</b><c>300</c></bar>
| <baz><a>100</a><b>two hundred</b><c>three hundred</c></baz>
| <qux><a>100</a><b>200</b><c>300</c></qux>
|</foo>
|""".stripMargin
val res2 = XmlDecoder[Foo].decode(xml)
println(res2)
}
// Almost like example 1, but the document is invalid
def example3(): Unit = {
case class Bar(foo: Foo)
case class Foo(foo: Int)
given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")
val res3 = XmlDecoder[Bar].decode("<bar><foo><foo>text</foo></qux></bar>")
println(res3)
}
example1()
example2()
example3()
Thanks for your help :) Sorry for the delayed answer, but I'm trying to figure out in what case the initial implementation could trigger a stack overflow. I had that recently, but cannot get to the XML that triggered it. Ideally, I'd like to write a test preventing this behavior from happening before fixing the implementation.
aha! I've figured it out:
case class Bar(foo: Field[Foo])
case class Foo(foo0: Field[Unit])
given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(Field.Found(())))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")
println(XmlDecoder[Bar].decode("<bar><foo><foo0></foo0></foo></bar>")) // will overflow
This seems related to a weird side-effect (not in the FP sense) between the OrElseElement and Unit decoders. The good news is that your implementation fixes it.
Hi!
Closing this issue, because OrElseElement seems to work. Feel free to reopen, if there will be any updates or additional questions