spec-tools
spec-tools copied to clipboard
Support for XML transformations
Hi there,
I have a small project to transform one xml format into another (call them fmt-a and fmt-b). I thought Clojure Spec might be useful to define the shape of the data at each stage of the transformation and check the transformation works correctly for expected inputs.
Then I found spec-tools and it's transformers in this blog post:
https://www.metosin.fi/blog/spec-transformers/
So I thought I'd investigate that. I noted that although it supports JSON it doesn't seem to support XML yet (although XML is mentioned in the blog post?)
Anyway this was going to be my approach with spec-tools:
- parse fmt-a-xml from xml file
- transform fmt-a-xml-data (i.e. parsed
{:tag :attributes :content structure}
) to fmt-a (i.e.{:fmt-a-key some-val}
) - transform fmt-a to fmt-b (i.e.
{:fmt-b-key some-val}
) - transform fmt-b to fmt-b-xml-data (i.e.
{:tag :attributes :content structure}
) - format fmt-b-xml-data to xml file
Does this sound sensible? I'm not sure how to achieve all of the above with spec-tools, but I was going to start experimenting with step 3, the core data transformation.
One issue I can see before I proceed, is that "invalid" inputs would raise an error at step 2 or 3, so the errors wouldn't make a lot of sense in relation to the xml input file.
Note: there's no xml schema available for fmt-a or fmt-b, if that matters.
Thanks!
You could use spec-tools for the XML->EDN transformation but I think you still need an XML->EDN->XML converter. Last time we needed this ended up writing a small utility lib for this.
<products><product><id>1</id></product><product><id>2</id></product></products>
would be converted into something like this:
{:products [{:id 1} {:id 2}]}
... and back.
works nicely for a large subset of XML. The EDN format could have spec and spec-tools could encode & decode the types.
PS. oh, one version of the xml-helper is here. Would need some love, last commit 4years ago and haven't most likely used since :O
Hi there,
Thanks! I tried the xml-helper link (noxml) but I got a 404..?
Also, re: "I think you still need an XML->EDN->XML converter", could you expand on that? I was hoping/assuming that spec-tools could provide all of
XML(A)->EDN(A) EDN(A)->EDN(B) EDN(B)->XML(B)
With an understanding that some extra work would be required for the XML transformations?
spec-tools is not an XML-parser. The default XML-parsers return the XML in the verbose map format with :tag
, :attrs
and :content
. You could write a spec transformer for that, but could be a lot of work(?). But, if you can convert that into a JSON/EDN-like map with tag-names as keys, you are mostly there and the spec transformation works just like with JSON / Strings.
The linked lib seems internal, not fully tested.
Here's a full round-robin with JSON. The JSON->EDN is done by Muuntaja:
(require '[clojure.spec.alpha :as s])
(require '[spec-tools.core :as st])
(require '[muuntaja.core :as m])
(s/def ::name string?)
(s/def ::birthdate inst?)
(s/def ::age int?)
(s/def ::languages
(s/coll-of
(s/and keyword? #{:clj :cljs})
:into #{}))
(s/def ::user
(s/keys
:req-un [::name ::languages ::age]
:opt-un [::birthdate]))
(defn encode-json [x] (slurp (m/encode m/instance "application/json" x)))
(defn decode-json [x] (m/decode m/instance "application/json" x))
(def ilona {:birthdate #inst "1968-01-02T15:04:05Z"
:age 48
:name "Ilona"
:languages #{:clj :cljs}})
(as-> ilona $
(doto $ prn)
(encode-json $)
(doto $ prn)
(decode-json $)
(doto $ prn)
(st/decode ::user $ st/json-transformer)
(do (assert (= $ ilona)) $)
(doto $ prn)
(encode-json $)
(doto $ prn)
(assert (= (encode-json ilona) $)))
; {:birthdate #inst "1968-01-02T15:04:05.000-00:00", :age 48, :name "Ilona", :languages #{:clj :cljs}}
; "{\"birthdate\":\"1968-01-02T15:04:05Z\",\"age\":48,\"name\":\"Ilona\",\"languages\":[\"clj\",\"cljs\"]}"
; {:birthdate "1968-01-02T15:04:05Z", :age 48, :name "Ilona", :languages ["clj" "cljs"]}
; {:birthdate #inst "1968-01-02T15:04:05.000-00:00", :age 48, :name "Ilona", :languages #{:clj :cljs}}
; "{\"birthdate\":\"1968-01-02T15:04:05Z\",\"age\":48,\"name\":\"Ilona\",\"languages\":[\"clj\",\"cljs\"]}"
Hi there,
Thanks for the advice! I've been experimenting with transforming the verbose :tag
, :attrs
and :content
structure output by the parser (which I'll call widget-xml
) into something more manageable (which I'll call widget
) for further transformations downstream etc.
So, I have a data spec like this:
(def widget
{::some-widget-property number?})
(def widget-spec
(->
(std/spec
{:name ::widget
:spec widget})
(assoc
:encode/xml widget->xml
:decode/xml xml->widget)))
And an "xml transformer" like this:
(def xml-transformer
(st/type-transformer
{:name :xml
:decoders stt/string-type-decoders
:default-encoder stt/any->any}))
Then I can use the following code:
(st/decode widget-spec widget-xml xml-transformer)
To transform
{:tag :WIDGET :content ["123"]}
Into
{::some-widget-property 123}
So, my xml->widget
decode function transforms the verbose structure, and spec-tools is handling the coercion of the leaf types. Great!
However, I then wanted to define a spec fdef
for the xml->widget
decode function (imagine it's a bit more complex than the above example, where there are nested structures and therefore several nested specs, and several nested functions for transforming the structure). So I define another spec for the decode function input :args
(call it widget-xml-spec
, but I can't use the already existing widget-spec
for the decode function output :ret
, because that assumes coercion of the types.. So currently I have to make a additional spec for the decode function output, call it widget-str-spec
which is equivalent to widget-spec
except for the coercion parts. For the above example, the data spec would be:
(def widget-str
{::some-widget-property string?})
(def widget-str-spec
(std/spec
{:name ::widget-str
:spec widget-str}))
So if there are many functions making up the decoding, for which it would be nice to have a spec fdef
on each, there's three specs to be made for each (xxx-xml-spec
, xxx-str-spec
, xxx-spec
), where xxx-str-spec
is just duplicating xxx-spec
defined for the coercion output, but is just replacing coercible types with string?
etc.
I tried defining :ret
as:
#(s/valid? ::widget-spec (st/coerce ::widget-spec % xml-transform))
Which works ok of course when :ret
is valid, but when it isn't, the output from spec explaining the problem hides the useful details, since the spec is wrapped in another predicate.
The only other idea I thought about to avoid manually creating the extra xxx-str-spec
for each fdef
I write, is to programmatically make that spec from xxx-spec
. But I wasn't sure how to go about that, and it might be a bit fiddly/a lot of work.
So, in conclusion I'm unsure about how to proceed (apart from manually creating the xxx-str
specs for each fdef
), which doesn't feel right..
Just following this up, I found a solution to my problem..
I decided to define the widget-xml-spec
with coerced types instead of strings, for example
(def widget-xml
{:tag :WIDGET :content [number?]})
(def widget-xml-spec
(std/spec
{:name ::widget-xml
:spec widget-xml})))
Then I can use the following code:
(st/decode widget-spec widget-xml xml-transformer)
To transform
{:tag :WIDGET :content ["123"]}
Into
{::some-widget-property 123}
As in, spec-tools is doing the coercion of "123"
to 123
. This solution means I only need to define two specs instead of three, and I can now define a spec fdef for the xml->widget
function like this:
(s/fdef xml->widget
:args (s/cat widget-xml-spec)
:ret widget-spec)