How can I deserialize a `<xs:choice>` XML element into an OCaml variant type?
This is a question, not an issue. I am trying to use this library to parse an XML format I do not have control over. Suppose it has an XSD as follows:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="modules">
<xs:complexType>
<xs:sequence>
<xs:element name="RootModule" type="xs:string"/>
<xs:sequence maxOccurs="unbounded">
<xs:choice>
<xs:element ref="ModuleNode"/>
<xs:element ref="ModuleNodeRef"/>
</xs:choice>
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Here is an example XML input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<modules>
<RootModule>Test</RootModule>
<ModuleNodeRef>
<UID>304</UID>
</ModuleNodeRef>
<ModuleNode>
<uniquename>Test</uniquename>
</ModuleNode>
</modules>
I would like to deserialize this into something similar to the following, where the ModuleNodeRef and ModuleNode instances are a list of variant types:
open Protocol_conv_xml
type module_node = {
uniquename : string
} [@@deriving protocol ~driver:(module Xml_light)]
type module_node_ref = {
uid : int [@key "UID"]
} [@@deriving protocol ~driver:(module Xml_light)]
type module_node_or_ref =
| ModuleNode of module_node [@key "ModuleNode"]
| ModuleNodeRef of module_node_ref [@key "ModuleNodeRef"]
[@@deriving protocol ~driver:(module Xml_light)]
type modules = {
root_module: string; [@key "RootModule"]
modules : module_node_or_ref list;
} [@@deriving protocol ~driver:(module Xml_light)]
let () =
let xml = Xml.parse_file "Test.xml" in
match modules_of_xml_light xml with
| Ok s -> s |> modules_to_xml_light |> Xml.to_string_fmt |> Printf.printf "t serialized: %s\n";
| Error e -> print_endline (Xml_light.error_to_string_hum e)
However, when I run the program on the example input it just produces:
t serialized: <record>
<RootModule>Test</RootModule>
</record>
So the variants are ignored. How can I correctly parse the variants? Thanks!
As a follow-on, if I start from a variant type and try to round-trip to XML I get an error; here is the code:
open Protocol_conv_xml
type foo = {
name : string [@key "name"]
}
[@@deriving protocol ~driver:(module Xml_light)]
type bar = {
value : int
}
[@@deriving protocol ~driver:(module Xml_light)]
type t =
| A of foo [@key "A"]
| B of bar [@key "B"]
[@@deriving protocol ~driver:(module Xml_light)]
let () =
let value = A {name = "test"} in
let xml = to_xml_light value in
Printf.printf "t serialized: %s\n" (Xml_light.to_string xml);
()
This produces an error when run:
Fatal error: exception Primitive value expected in node: variant for string. T: '<variant>
A
<record>
<name>test</name>
</record>
</variant>'
Raised at Protocol_conv_xml__Xml_light.raise_errorf.(fun) in file "drivers/xml_light/xml_light.ml", line 30, characters 34-63
Called from Protocol_conv_xml__Xml_light.to_value in file "drivers/xml_light/xml_light.ml", line 193, characters 38-120
Called from Protocol_conv_xml__Xml_light.to_string in file "drivers/xml_light/xml_light.ml" (inlined), line 214, characters 16-46
Called from Dune__exe__Main in file "bin/main.ml", line 21, characters 37-62
Thanks for the report.
ppx_protocol_conv_xml_light does not handle schemas, but only looks at various nodes in the xml.
I will need to test a bit furhter, but to parse the example given, I think you should create two sequences - one for ModuleNode and one for ModuleNodeRef:
type module_node = {
uniquename : string
} [@@deriving protocol ~driver:(module Xml_light)]
type module_node_ref = {
uid : int [@key "UID"]
} [@@deriving protocol ~driver:(module Xml_light)]
type modules = {
root_module: string; [@key "RootModule"]
module_nodes: module_node list; [@key "ModuleNode"]
module_node_refs: module_node_ref list; [@key "ModuleNodeRefs"]
} [@@deriving protocol ~driver:(module Xml_light)]
As a follow-on, if I start from a variant type and try to round-trip to XML I get an error; here is the code:
open Protocol_conv_xml type foo = { name : string [@key "name"] } [@@deriving protocol ~driver:(module Xml_light)] type bar = { value : int } [@@deriving protocol ~driver:(module Xml_light)] type t = | A of foo [@key "A"] | B of bar [@key "B"] [@@deriving protocol ~driver:(module Xml_light)] let () = let value = A {name = "test"} in let xml = to_xml_light value in Printf.printf "t serialized: %s\n" (Xml_light.to_string xml); () This produces an error when run:Fatal error: exception Primitive value expected in node: variant for string. T: '
A ' Raised at Protocol_conv_xml__Xml_light.raise_errorf.(fun) in file "drivers/xml_light/xml_light.ml", line 30, characters 34-63 Called from Protocol_conv_xml__Xml_light.to_value in file "drivers/xml_light/xml_light.ml", line 193, characters 38-120 Called from Protocol_conv_xml__Xml_light.to_string in file "drivers/xml_light/xml_light.ml" (inlined), line 214, characters 16-46 Called from Dune__exe__Main in file "bin/main.ml", line 21, characters 37-62test
Xml_light.to_string is an internal function to parse a node value as a string. You should use Xml.to_string_fmt as in your first example.
I will need to test a bit furhter, but to parse the example given, I think you should create two sequences - one for
ModuleNodeand one forModuleNodeRef
That is what I did for this case, but this was really a simplified example for a much more complicated example later on. This is basically the XML output of a parser and one of the nodes is an <entry> node that can be one of many possible node types. Here is the relevant section of the XSD:
https://github.com/tlaplus/tlaplus/blob/17e5d773b05e492311d3f3cc754e695b6f43b5ee/tlatools/org.lamport.tlatools/src/tla2sany/xml/sany.xsd#L28-L51
TL;DR
I don't think ppx_protocol_conv will be able to do what you want.
As I understand it, you want a record containing a algebraic datatype (<xs:choice>) and other fields to be "inlined": e.g.
type value = A of int
| B of string
type elem = {
field: int;
value: value;
}
type record = {
data: elem list;
}
to serialize to:
<record>
<field>1</field>
<A>2</A>
<field>2</field>
<B>two</B>
...
</record>
the problem is that ppx_protocol_conv needs to have a name for each field. In this case the field value is not present, so it's not possible for the framework to construct a deserializer for the value field (or for the framework to know what value to deserialize for the field value.
This seems like a fairly common pattern in XML that is certainly supported by XSD, so do you think it would be worth supporting with like an [@Inline] tag on the field or something? Wish that OCaml supported anonymous variants.
It will be quite difficult to support with the current architecture.
Consider deserialization: A sequence is defined as a list of elements of type t. For each element in the list, the de-serialization function is called and returns a type t. If the sequence is "inlined", then the bounds is less clear, and the de-serialization function will need to consume elements from a stream. This is a rather large change to the underlying infrastructure.
Mapping constructores to node names is less disruptive, and could potentially be done - but then some care must be taken to avoid name clashes in e.g.
type adt = A of int | B of string
type t = { adt: adt;
a: int; [@key "A"]
}
I think I'll reopen the issue, and see if I can create an alternative driver that can support (some of) these features.
I would say the sequence of variants use case is less important than just inline variants generally, since sequences of variants can always be turned into fields of empty or nonempty sequences. Whereas if you have:
type adt = A of int | B of string
type t = {
adt: adt;
uid: int;
}
then I think this would only accept inputs like:
<t>
<uid>5</uid>
<adt>
<A>6</A>
</adt>
</t>
whereas it seems much more common in XML to omit the name of the variant datatype and just use the names of the specific variants inline, so either:
<t>
<uid>5</uid>
<A>6</A>
</t>
or:
<t>
<uid>5</uid>
<B>foo</B>
</t>