anti-xml icon indicating copy to clipboard operation
anti-xml copied to clipboard

Improve support for URI parts in namespaces

Open trygvis opened this issue 12 years ago • 8 comments

I've been trying to use anti-xml to parse and generate some XML documents that use some elements from the atom namespace 1. My code uses our own private namespace, but in the documents it is bound to the default namespace. An example XML looks like this:

<profile xmlns="http://.." xmlns:atom="http://www.w3.org/2005/Atom">
  <atom:link id=".." href=".."/>
</profile>

I've run into two issues with this.

The first is that the conversions drop the namespace entirely (see 2). I've fixed this so that it's more in line with the existing API.

The second issue I see now is that the entire API is oriented around the "prefix" parts instead of the namespace part. When I'm converting the XML to my Profile object I don't care about the prefix, I just want the elements named "link" inside the Atom namespace. Right now I'll have to do this by hand.

I would like to adjust the API so that it's more in line with java.xml.QName and W3's definition, see 3 and 4. The "prefix" is not a part of the qualified name of an element and really not very interesting when it comes to matching on objects.

I'm hoping to be able to write something like this:

def atomLinks(e: Elem) = (e / (Namespaces.atom -> "link"))

Does this make sense? I really like anti-xml and we're using it for most of our XML stuff now but this came up as an issue the other day and I don't see a way to fix it without changing anti-xml.

I've implemented my ideas under my own repository, 5. I'm not entirely satisfied with the current solution but it shows what I want to achieve. All the existing tests passes and I've added some more too.

trygvis avatar Dec 23 '11 13:12 trygvis

+1. This is a very annoying problem when using lots of namespaces. I am building an atom library for Scala using anti-xml.

hamnis avatar Jan 05 '12 14:01 hamnis

So, this is one of those areas where we're consciously compromising in order to get a more usable functional tree. I believe @jespersm raised these same points. Theoretically speaking, this is a symptom of XML's scoping semantics. In XML, data flows down the tree (from the root to the leaves). However, a functional tree is built bottom up (from the leaves to the root). This creates a fairly annoying impedance mismatch. Consider the following fragment:

val foo = Elem(Some("ns"), "foo", Attributes(), Map(), Group())

This would map to an XML node <ns:foo/> where ns is an unbound prefix, and thus corresponds to no namespace. In XML, even unbound prefixes are significant and we need to preserve them. Unfortunately (and here is where the bottom-up impedance comes into play), it's not difficult to exploit this to generate absurdities:

val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.google.com"), Group(foo))

We have explicitly put foo into scope of a parent element which has bound ns to a specific URI. However, foo's scope doesn't reflect this because foo's scope was built before bar's! There is no way to serialize this back into XML without losing some information (from the functional tree).

Even worse, we can create trees that are self-contradicting:

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))

Now what? And just to illustrate that this problem is fundamental to the bottom-up nature of functional trees, we can create a pair of trees (bar and baz) that use the exact same element in different ways:

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))
val baz = Elem(None, "baz", Attributes(), Map("ns" -> "http://www.bing.com"), Group(foo))

There's just no way we can solve this problem in general with functional, bottom-up trees. Making the namespace primary only serves to make it harder for users to manually untangle these cases. By focusing on the prefix, we're forcing the user to maintain context and scoping information in their top-down traversal of the tree if they need this information. It's a bit painful for applications like Atom, where you have multiple namespaces, but at present, I'm not sure I see a good alternative.

Anyway, I'll spend some quality time with your code and see how it addresses these issues. I'm always open to being wrong!

djspiewak avatar Jan 08 '12 23:01 djspiewak

Well, there is a way which keeps the nice bottom up semantics, which is the one I suggested in the original patch for the namespace handling. The idea is to enforce namespace mappings at every element and then optimize when generating XML.

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

jespersm avatar Jan 09 '12 15:01 jespersm

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

Right, so one way or another, we're compromising on some aspect of the functionality. Without inverting the parenting of the tree, I don't see a way to avoid this.

djspiewak avatar Jan 10 '12 02:01 djspiewak

It doesn't really allow for upper-level nodes to change the meaning of the children's namespace mappings, that must be done by rewriting the tree.

Right, so one way or another, we're compromising on some aspect of the functionality. Without inverting the parenting of the tree, I don't see a way to avoid this.

Agreed, but how often would you want to change the namespaces like that? I never do, since it's very rare that whole structures of local names in one namespace is directly transferable to another, except for some tricky versioning transformation cases. It's be like having the surname at the top of the (paper) phonebook, just so you can change it from Jones to Smith. Sure it's cheap, but it makes little sense :-)

jespersm avatar Jan 10 '12 09:01 jespersm

I fail to see the issue here. As far as I can read from the XML NS spec it's not allowed to have unbound prefix. See 1, "Namespace constraint: Prefix Declared".

This

val foo = Elem(Some("ns"), "foo", Attributes(), Map("ns" -> "http://www.google.com"), Group())
val bar = Elem(None, "bar", Attributes(), Map("ns" -> "http://www.yahoo.com"), Group(foo))

is the same as

<bar xml:ns="http://www.yahoo.com">
  <ns:foo xmlns:ns="http://www.google.com"/>
</bar>

which is fine. bar itself wouldn't have a namespace, but at the same time it declares a namespace called ns. foo is in the http://www.google.com namespace which it declared with itself.

Any namespace can be declared at any level in the tree and they apply from that point and down. Any child elements can override whatever they want. From what I can tell this meshes just fine with how I would expect XML generation to happen.

Ideally the entire prefix could be dropped from the model as it's only the namespace that's really relevant and anti-xml could just generate namespaces when serializing the data (like most SOAP implementations does). What I would like to see is something similar to this:

def foo2entry(foo: Foo) = Elem(Atom.namespace, "entry", Attributes(),
  Group(foo.bars.map(bar2Link)) ++ Group(..))

def bar2link(bar: Bar) = Elem(Atom.namespace, "bar",
  Attributes("href" -> "http://..", "rel" -> ".."), Group())

where Atom.namespace is a NSRepr wrapping Atom's namespace.

Ideally I would like something like this:

object Atom {
  val namespace = NSRepr("http://www.w3.org/2005/Atom")
  val entry = namespace.elem("entry")
  val link = namespace.elem("link")
  val href = namespace.attr("href")
  val rel = namespace.attr("rel")
}

import Atom._

def foo2entry(foo: Foo) = entry(Attributes(), NBS.empty,
  Group(foo.bars.map(bar2Link)) ++ Group(..))

def bar2link(bar: Bar) = link(Attributes(href(http://..), rel("..")), NBS.empty, Group())

but that's another issue :)

trygvis avatar Jan 10 '12 11:01 trygvis

What if:

val foo = Elem("http://www.google.com", "foo", Attributes(),None, Group())
val bar = Elem("http://www.yahoo.com", "bar", Attributes(), None, Group(foo, foo))

would give you

<bar xmlns="http://www.yahoo.com">
  <foo xmlns="http://www.google.com"/>
  <foo xmlns="http://www.google.com"/>
</bar>

But then it also had an explicit "preferred" binding of prefixes to namespaces, like this:

val foo = Elem("http://www.google.com", "foo", Attributes(), None, None, Group())
val baz = Elem("http://www.bing.com", "baz", Attributes(), None, None, Group())
val bar = Elem("http://www.yahoo.com", "bar", Attributes(),
    Some(Map("go" -> "http://www.google.com")), Group(foo, baz,foo))

would give you explicit control over the namespaces you were interested in, and handle the rest automatically:

<bar xmlns="http://www.yahoo.com" xmlns:go="http://www.google.com">
  <go:foo/>
  <baz xmlns="http://www.bing.com"/>
  <go:foo/>
</bar>

Finally, there could be an optimizing traversal which would pick up the namespaces and bubble them as far up as needed (needs only replace near the top i most cases)

I think this would strike a fair balance between bottom-up functional-ness and still provide the control needed for special applications like XSD, XSLT and other languages which use XPath and similar expressions to reference qualified elements.

jespersm avatar Jan 12 '12 00:01 jespersm

@djspiewak: I've pushed some code that's getting quite close to what I want. I haven't implemented the NSRepr stuff as I couldn't figure out how that should be implemented. It would be nice with some more explanation on exactly how you'd like that part to be like.

@jespersm: It's easy to walk the tree, find all namespaces and put them in the namespace list in the root node. CPU intensitive, but it'll only replace the root object. Should be easy to implement too. Getting exactly your result is a bit harder.

The commit: https://github.com/trygvis/anti-xml/commit/4a7e808f95be6544edb328e44859d2ae1a0cdea9.

trygvis avatar Jan 13 '12 15:01 trygvis