dxml icon indicating copy to clipboard operation
dxml copied to clipboard

namespace support

Open gedaiu opened this issue 6 years ago • 10 comments

will you consider in the future to add namespace support for the xml tags?

gedaiu avatar May 04 '18 11:05 gedaiu

What would you consider to be necessary for that? I'm not very familiar with XML namespacing other than the fact that the names then have a namespace in them which then tells the application something about what the tag is for. As it stands, the name is provided as-is, so it should be trivial for anyone to get the namespace by calling split or splitter on it. Is there something beyond that that's really needed for whatever folks normally do with namespaces?

jmdavis avatar May 04 '18 12:05 jmdavis

I would like to replace my dummy xml implementation with yours in this library: https://github.com/gedaiu/vibe.dav

The problem is that the dav protocol uses a lot of namespaces

gedaiu avatar May 04 '18 13:05 gedaiu

Okay. But what does "namespace support" mean to you? As I understand it, the namespace is just part of the name where the part before a colon is the namespace - e.g. <foo:bar> is in the namespace "foo". Protocols or specifications may then treat that namespace as meaning something (e.g. to differentiate between <foo:bar> and <other:bar>), but from what I can tell, from the standpoint of parsing, there really isn't anything special about them. They're just names with colons in them. dxml provides the name of the start and end tags to the program using it, so the namespace is there in the name and can be trivially pulled out of the full tag name using std.array.split or std.algorithm.splitter.

Is the problem that you want an easier or more idiomatic way to pull out the namespace where you do something like range.front.name.namespace rather than calling split yourself? Or are you talking about adding some sort of validation related to namespaces? Or something else?

jmdavis avatar May 04 '18 13:05 jmdavis

You could use a split for the tag namebut I don't think it's enough, because you could have this xmls documents which are equivalent:

<a:table xmlns:a="http://somedefinition.com">
  <a:name>some name</a:name>
</a:table>
<b:table xmlns:b="http://somedefinition.com">
  <b:name>some name</a:name>
</b:table>

And every DAV client has their own way for sending the prefix name. It could be definitely be handled by the client library but it would be nice if something like this would be possible with your library:

assert(range.front.namespace == "http://somedefinition.com") /// instead of `a` or `b`

gedaiu avatar May 04 '18 14:05 gedaiu

So, when you say that you want the namespace, you don't mean the name of the namespace that goes in a tag name, you mean the URL associated with the namespace, because that uniquely identifies the namespace, whereas its name doesn't? That's definitely harder. It would be easy enough to provide a function for splitting the tag name into the namespace name and the local tag name, but the only place where the URL would be is in the start tag with the xlmns attribute. For the parser to provide that information, it would have to store it, which would mean allocating storage for it somewhere, which doesn't really make sense in the default case, especially since the parser is designed to allocate as little as possible. So, I'll have to think about a reasonable way to solve this.

My first thought is to provide a wrapper range that examines each start tag in popFront to see if it's a namespace declaration and adds it to the list of namespaces that it knows about, and it can then use that to provide the information. But I'll definitely have to study the XML namespace spec and think about this.

jmdavis avatar May 04 '18 17:05 jmdavis

It's exactly what I was thinking... I think this might be the best approach. I'll watch the project for this feature :)

gedaiu avatar May 04 '18 17:05 gedaiu

I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased: eg:

auto r2 = result.skipToPath("fcsProtocolEF3");

instead of:

auto r2 = result.skipToPath("ns2:fcsProtocolEF3");

bubnenkoff avatar Mar 04 '19 13:03 bubnenkoff

I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased:

dxml doesn't currently understand anything about namespaces. "fcsProtocolEF3" and "ns2:fcsProtocolEF3" are different names, because the entire string is the name. A function like skipToPath requires that the name be an exact match. If you're looking for a partial match, then you'll have to do something like

auto r2 = result.find!(a => a.type == EntityType.elementStart &&
                       (a.name == "fcsProtocolEF3" || a.name.endsWith(":fcsProtocolEF3"))();

though that's going to be reading through all of the entities linearly and would give you any tag anywhere in the document after the current entity which had a matching name, regardless of its depth or relation to the current entity. So, it's not really equivalent to skipToPath. To do what skipToPath does, you would have to navigate to each start tag and check it, using skipContents to skip any child tags of the start tag. So, something like

// assuming that you're on a start tag and that SplitEmpty.yes is used
while(true)
{
    if(range.front.name == "fcsProtocolEF3" || range.front.name.endsWith(":fcsProtocolEF3")
        break;
    range = range.skipContents(); // skips to the corresponding end tag
    range.popFront(); // skips the corresponding end tag
    switch(range.front.type)
    {
        case EntityType.elementStart: continue;
        case EntityType.elementEnd: break; // we've gone up a level
        default:
        {
            range = range.skipToEntityType(EntityType.elementStart, EntityType.elementEnd);
            if(range.front.type == EntityType.elementEnd)
                break; // we've gone up a level
            continue;
        }
    }
    /+ do whatever you do when the tag isn't there +/
}

Alternatively, if you don't care about the memory consumption, you could call parseDOM on the parent tag and get the DOM tree for that section of the tree and then just check each of its direct children.

But really, as things stand, dxml doesn't really have any good helper functions for searching for tags based on their names unless you're looking for the exact name.

jmdavis avatar Mar 12 '19 12:03 jmdavis

@jmdavis could you add ns support in future?

bubnenkoff avatar Mar 12 '19 13:03 bubnenkoff

I intend to add something, but I don't know exactly what it will look like yet. It will probably involve a helper wrapper around the existing functionality, but I have to find time to sit down and work out what's really needed.

jmdavis avatar Mar 12 '19 14:03 jmdavis