dxml
dxml copied to clipboard
namespace support
will you consider in the future to add namespace support for the xml tags?
What would you consider to be necessary for that? I'm not very familiar with XML namespacing other than the fact that the names then have a namespace in them which then tells the application something about what the tag is for. As it stands, the name is provided as-is, so it should be trivial for anyone to get the namespace by calling split
or splitter
on it. Is there something beyond that that's really needed for whatever folks normally do with namespaces?
I would like to replace my dummy xml implementation with yours in this library: https://github.com/gedaiu/vibe.dav
The problem is that the dav protocol uses a lot of namespaces
Okay. But what does "namespace support" mean to you? As I understand it, the namespace is just part of the name where the part before a colon is the namespace - e.g. <foo:bar>
is in the namespace "foo"
. Protocols or specifications may then treat that namespace as meaning something (e.g. to differentiate between <foo:bar>
and <other:bar>
), but from what I can tell, from the standpoint of parsing, there really isn't anything special about them. They're just names with colons in them. dxml provides the name of the start and end tags to the program using it, so the namespace is there in the name and can be trivially pulled out of the full tag name using std.array.split
or std.algorithm.splitter
.
Is the problem that you want an easier or more idiomatic way to pull out the namespace where you do something like range.front.name.namespace
rather than calling split
yourself? Or are you talking about adding some sort of validation related to namespaces? Or something else?
You could use a split for the tag namebut I don't think it's enough, because you could have this xmls documents which are equivalent:
<a:table xmlns:a="http://somedefinition.com">
<a:name>some name</a:name>
</a:table>
<b:table xmlns:b="http://somedefinition.com">
<b:name>some name</a:name>
</b:table>
And every DAV client has their own way for sending the prefix name. It could be definitely be handled by the client library but it would be nice if something like this would be possible with your library:
assert(range.front.namespace == "http://somedefinition.com") /// instead of `a` or `b`
So, when you say that you want the namespace, you don't mean the name of the namespace that goes in a tag name, you mean the URL associated with the namespace, because that uniquely identifies the namespace, whereas its name doesn't? That's definitely harder. It would be easy enough to provide a function for splitting the tag name into the namespace name and the local tag name, but the only place where the URL would be is in the start tag with the xlmns attribute. For the parser to provide that information, it would have to store it, which would mean allocating storage for it somewhere, which doesn't really make sense in the default case, especially since the parser is designed to allocate as little as possible. So, I'll have to think about a reasonable way to solve this.
My first thought is to provide a wrapper range that examines each start tag in popFront
to see if it's a namespace declaration and adds it to the list of namespaces that it knows about, and it can then use that to provide the information. But I'll definitely have to study the XML namespace spec and think about this.
It's exactly what I was thinking... I think this might be the best approach. I'll watch the project for this feature :)
I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased: eg:
auto r2 = result.skipToPath("fcsProtocolEF3");
instead of:
auto r2 = result.skipToPath("ns2:fcsProtocolEF3");
I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased:
dxml doesn't currently understand anything about namespaces. "fcsProtocolEF3"
and "ns2:fcsProtocolEF3"
are different names, because the entire string is the name. A function like skipToPath
requires that the name be an exact match. If you're looking for a partial match, then you'll have to do something like
auto r2 = result.find!(a => a.type == EntityType.elementStart &&
(a.name == "fcsProtocolEF3" || a.name.endsWith(":fcsProtocolEF3"))();
though that's going to be reading through all of the entities linearly and would give you any tag anywhere in the document after the current entity which had a matching name, regardless of its depth or relation to the current entity. So, it's not really equivalent to skipToPath
. To do what skipToPath
does, you would have to navigate to each start tag and check it, using skipContents
to skip any child tags of the start tag. So, something like
// assuming that you're on a start tag and that SplitEmpty.yes is used
while(true)
{
if(range.front.name == "fcsProtocolEF3" || range.front.name.endsWith(":fcsProtocolEF3")
break;
range = range.skipContents(); // skips to the corresponding end tag
range.popFront(); // skips the corresponding end tag
switch(range.front.type)
{
case EntityType.elementStart: continue;
case EntityType.elementEnd: break; // we've gone up a level
default:
{
range = range.skipToEntityType(EntityType.elementStart, EntityType.elementEnd);
if(range.front.type == EntityType.elementEnd)
break; // we've gone up a level
continue;
}
}
/+ do whatever you do when the tag isn't there +/
}
Alternatively, if you don't care about the memory consumption, you could call parseDOM
on the parent tag and get the DOM tree for that section of the tree and then just check each of its direct children.
But really, as things stand, dxml doesn't really have any good helper functions for searching for tags based on their names unless you're looking for the exact name.
@jmdavis could you add ns support in future?
I intend to add something, but I don't know exactly what it will look like yet. It will probably involve a helper wrapper around the existing functionality, but I have to find time to sit down and work out what's really needed.