phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Default value whenever a value cannot be decoded

Open FrancisToth opened this issue 3 years ago • 6 comments

I'd like to provide some way to set a default value whenever some field cannot be decoded (because it does not have the right type or else). Here's an attempt to do this:

Could anyone look at this and tell me if I am doing it the right way?

final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
  override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] =
    OrElseElement(decoder.decodeAsElement(c, localName, namespaceUri), _ => default(localName))

  override def result(history: => List[String]): Either[DecodingError, A] =
    Right(decoder.result(history).fold(e => default(history.headOption.getOrElse("root")), identity))

  override def isCompleted: Boolean = decoder.isCompleted
}

Here's the same idea for attribute decoding:

final class OrElseAttribute[A, B <: A](decoder: AttributeDecoder[A], onError: (String, String) => B) extends AttributeDecoder[A] {
  override def decodeAsAttribute(c: Cursor, localName: String, namespaceUri: Option[String]): Either[DecodingError, A] =
    Right(decoder.decodeAsAttribute(c, localName, namespaceUri).fold(e => onError(localName, e.getMessage), identity))
}

Here's how this could be used:

enum Field[+A] { self =>
  case Present[A](value: A) extends Field[A]
  case Missing extends Field[Nothing]
  case Unparsable(error: String) extends Field[Nothing]

given [A: ElementDecoder]ElementDecoder[Field[A]] = {
  new OrElseElement(
    ElementDecoder[Option[A]].emap { (_, oa) =>
      Right(oa.fold(Field.Missing)(Field.Found(_)))
    },
    Field.UnParsable(_)
  )
}

FrancisToth avatar Sep 14 '22 20:09 FrancisToth

OrElseAttribute seems to be correct. Do you get any errors, when you use it?

It's more complicated with OrElseElement though. ElementDecoder.decodeAsElement reads StAX events from an input stream (it's c: Cursor) and accumulates state. When an error occurs, decodeAsElement returns a completed decoder. This decoder always returns the error, and it can't read any events. As far as I understand, in case of an error OrElseElement will finish decoding and return default value, but StAX input stream will have invalid state. It will contain events that would have been read if the error had not occurred. So OrElseElement should read events from input stream until it gets closing tag for the element that is being decoded.

This explanation may be complicated, so I'll try to provide a working example this weekend.

I also want to add, that current architecture of phobos doesn't allow easy customization of decoders. API of ElementDecoder is kind of ugly and difficult to use. I have some ideas, how it can be changed, but I am not sure, when will I be able to implement them

valentiay avatar Sep 15 '22 18:09 valentiay

First, thank you for your help!

When you mention that OrElseElement should read the input stream until it gets a closing tag, are you referring to c.isEndElement? I guess then the idea is to do pretty much the same thing that with the stringEncoder:

if (c.isEndElement) {
  ElementDecoder.errorIfWrongName[A](c, localName, namespaceUri).getOrElse {
    c.next()
    new ConstDecoder(default(localName))
  }
} else {
  new FailedDecoder(c.error(s"Unexpected event: '${c.getEventType}'"))
}

is that right?

FrancisToth avatar Sep 15 '22 19:09 FrancisToth

Yes, it's somehow similar to the stringEncoder. In this case decoder have to find not just the first "end element" event, but the "end element" event, that belongs to the element being decoded. Consider the example below.

case class Bar(foo: Foo)
case class Foo(foo: Int) // Two elements have the same name to illustrate a corner case

given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")

Let's process the following document.

<bar><foo><foo>text</foo></foo></bar>

It cannot be parsed without OrElseElement, because the inner <foo> must be an integer. If we use the initial implementation of OrElseElement, an error will occur:

<bar><foo><foo>text</foo></foo></bar>
               ^^^^
Cursor will stop here. ElementDecoder[Bar] is expecting cursor to be on </bar>, 
so ElementDecoder[Bar] will fail to parse the document

The correct implementation should move the cursor outside of the outer <foo> element:

<bar><foo><foo>text</foo></foo></bar>
                   ^^^^^^
First move. Cursor encounters the first "end element" event. 
The closing tag is named "foo", like the element being decoded, but it's an inner "foo".
Decoder is processing the outer "foo", so it must move on
<bar><foo><foo>text</foo></foo></bar>
                         ^^^^^^
Second move. It's "end element" "foo" again. 
But this time it's the end element the decoder is looking for.
The decoder must make one more move, so the cursor will be out of the outer "foo"
<bar><foo><foo>text</foo></foo></bar>
                               ^^^^^^
Third move. The cursor is now out of outer "foo" and ElementDecoder[Bar] will work correctly

valentiay avatar Sep 18 '22 15:09 valentiay

And here is the example:

final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
  override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] = {
    // Memoize the "depth" of element being decoded,
    // so decoder can distinguish nested elements with same name
    val initialDepth = c.history.length
    val updatedDecoder = decoder.decodeAsElement(c, localName, namespaceUri)
    // Handle error if it occurs
    if (updatedDecoder.isCompleted && updatedDecoder.result(c.history).isLeft) {
      // Skip events until cursor is on closing tag of the element being decoded
      while (!(
        c.isEndElement &&
          c.getLocalName == localName &&
          Option(c.getNamespaceURI).filter(_.nonEmpty) == namespaceUri &&
          c.history.length <= initialDepth)) {
        c.next()
      }
      // Make one more step out of the decoded element
      c.next()
      ElementDecoder.ConstDecoder(default(localName))
    } else {
      OrElseElement(updatedDecoder, default)
    }
  }
  override def result(history: => List[String]): Either[DecodingError, A] = decoder.result(history)
  override def isCompleted: Boolean = decoder.isCompleted
}

It should handle most cases. Two notable exceptions I came up with: It does not handle lists correctly (it was too complex to implement fast), and It does not handle the AsyncXMLStreamReader.EVENT_INCOMPLETE (it's probably irrelevant, unless you use phobos-fs2 or other streaming addons)

There are some usage examples in scastie

Source of those usage examples

import ru.tinkoff.phobos.decoding._
import ru.tinkoff.phobos.encoding._
import ru.tinkoff.phobos.syntax._
import ru.tinkoff.phobos.derivation.semiauto._

object Main1 extends App {
  final class OrElseElement[A, B <: A](decoder: ElementDecoder[A], default: String => B) extends ElementDecoder[A] {
    override def decodeAsElement(c: Cursor, localName: String, namespaceUri: Option[String]): ElementDecoder[A] = {
      // Memoize the "depth" of element being decoded,
      // so decoder can distinguish nested elements with same name
      val initialDepth = c.history.length
      val updatedDecoder = decoder.decodeAsElement(c, localName, namespaceUri)
      // Handle error if it occurs
      if (updatedDecoder.isCompleted && updatedDecoder.result(c.history).isLeft) {
        // Skip events until cursor is on closing tag of the element being decoded
        while (!(
          c.isEndElement &&
            c.getLocalName == localName &&
            Option(c.getNamespaceURI).filter(_.nonEmpty) == namespaceUri &&
            c.history.length <= initialDepth)) {
          c.next()
        }
        // Make one more step out of the decoded element
        c.next()
        ElementDecoder.ConstDecoder(default(localName))
      } else {
        OrElseElement(updatedDecoder, default)
      }
    }
    override def result(history: => List[String]): Either[DecodingError, A] = decoder.result(history)
    override def isCompleted: Boolean = decoder.isCompleted
  }

  // Example from the issue
   def example1(): Unit = {
    case class Bar(foo: Foo)
    case class Foo(foo: Int)

    given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
    given ElementDecoder[Bar] = ElementDecoder.derived
    given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")

    val res1 = XmlDecoder[Bar].decode("<bar><foo><foo>text</foo></foo></bar>")
    println(res1)
  }

  // More complex example
  def example2(): Unit = {
    case class Bar(a: Int, b: Int, c: Int) derives ElementDecoder
    case class Baz(a: Int, b: Int, c: Int)
    case class Qux(a: Int, b: Int, c: Int) derives ElementDecoder
    given ElementDecoder[Baz] = OrElseElement(ElementDecoder.derived[Baz], _ => Baz(0, 0, 0))
    case class Foo(bar: Bar, baz: Baz, qux: Qux) derives ElementDecoder
    given XmlDecoder[Foo] = XmlDecoder.fromElementDecoder("foo")

    val xml = """<foo>
      |  <bar><a>100</a><b>200</b><c>300</c></bar>
      |  <baz><a>100</a><b>two hundred</b><c>three hundred</c></baz>
      |  <qux><a>100</a><b>200</b><c>300</c></qux>
      |</foo>
      |""".stripMargin
    val res2 = XmlDecoder[Foo].decode(xml)
    println(res2)
  }

  // Almost like example 1, but the document is invalid
  def example3(): Unit = {
    case class Bar(foo: Foo)
    case class Foo(foo: Int)

    given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(0))
    given ElementDecoder[Bar] = ElementDecoder.derived
    given XmlDecoder[Bar] = XmlDecoder.fromElementDecoder("bar")

    val res3 = XmlDecoder[Bar].decode("<bar><foo><foo>text</foo></qux></bar>")
    println(res3)
  }

  example1()
  example2()
  example3()

valentiay avatar Sep 18 '22 16:09 valentiay

Thanks for your help :) Sorry for the delayed answer, but I'm trying to figure out in what case the initial implementation could trigger a stack overflow. I had that recently, but cannot get to the XML that triggered it. Ideally, I'd like to write a test preventing this behavior from happening before fixing the implementation.

FrancisToth avatar Sep 26 '22 19:09 FrancisToth

aha! I've figured it out:

case class Bar(foo: Field[Foo])
case class Foo(foo0: Field[Unit])

given ElementDecoder[Foo] = OrElseElement(ElementDecoder.derived[Foo], _ => Foo(Field.Found(())))
given ElementDecoder[Bar] = ElementDecoder.derived
given XmlDecoder[Bar]     = XmlDecoder.fromElementDecoder("bar")
      
println(XmlDecoder[Bar].decode("<bar><foo><foo0></foo0></foo></bar>")) // will overflow

This seems related to a weird side-effect (not in the FP sense) between the OrElseElement and Unit decoders. The good news is that your implementation fixes it.

FrancisToth avatar Sep 26 '22 19:09 FrancisToth

Hi! Closing this issue, because OrElseElement seems to work. Feel free to reopen, if there will be any updates or additional questions

valentiay avatar Jan 09 '23 07:01 valentiay