xmlquery icon indicating copy to clipboard operation
xmlquery copied to clipboard

Disregard namespace prefix

Open devangvira opened this issue 4 years ago • 10 comments

While parsing a XML document having the following structure: <?xml version="1.0" encoding="UTF-8"?> <pd:ProcessDefinition xmlns:pd="http://xmlns.xyz.com/process/2003" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <pd:activity name="Invoke Request-Response Service"> <pd:type>RequestReplyActivity</pd:type> <pd:resourceType>OpClientReqActivity</pd:resourceType> <pd:x>300</pd:x> <pd:y>80</pd:y> <config> ...

I have to specify the namespace prefix for parsing the tags like:

fmt.Println("Parsing process") for i, n := range xmlquery.Find(doc, "//pd:ProcessDefinition/pd:activity/pd:type") { fmt.Printf("#%d %s\n", i, n.InnerText()) }

Is there a way to disregard the namespace prefix? So that I can just look for xmlquery.Find(doc, "//ProcessDefinition/activity/type")

devangvira avatar Aug 12 '19 09:08 devangvira

Sorry, there is no way to get it. if XML have a namespace, you must specify a prefix to query. you can see this https://github.com/antchfx/xmlquery/issues/1

zhengchun avatar Aug 14 '19 05:08 zhengchun

So then is there a way/function to fetch the namespaces (prefixes and URIs) from a XML document?

devangvira avatar Aug 15 '19 05:08 devangvira

Maybe this method can help you if you don't care about namespace in XML. you can replace all pd: string to empty string before parse XML document, then query by "//ProcessDefinition/activity/type".

On the other hand, https://godoc.org/github.com/antchfx/xmlquery#NodeNavigator.Prefix can get namespace value.

for n := doc.FirstChild; n != nil; n = n.NextSibling {
	fmt.Println(n.Prefix)
}

zhengchun avatar Aug 15 '19 05:08 zhengchun

You can remove the namespaces with a function like this.

func removeNamespace(n *xmlquery.Node) {
    n.Prefix = ""
    for child := n.FirstChild; child != nil; child = child.NextSibling {
    removeNamespace(child)
}

sgoldenb avatar Dec 19 '19 17:12 sgoldenb

Hi @zhengchun . Thanks for your work. I agree with @devangvira, we are missing the search for an element with any namespace, for example: xmlquery.Find(doc, "//:ProcessDefinition/:activity/*:type") By the way, it lacks Node.Remove/Delete method. Thanks!

BigMak7410 avatar Apr 28 '20 23:04 BigMak7410

I think that the bigger issue is the fact that you have to use the same namespace prefix as in the source document. This prefix could be arbitrarily chosen by the document creator, leaving you to try to figure it out if you're accepting documents from 3rd parties.

Would it be possible to pass in to the Find an optional set of schemas as a map[string]string of abbreviations/namespaces that are used for your namespace aliasing? This is similar to what you can do when querying using ElemementTree in Python.

For the example before:

<?xml version="1.0" encoding="UTF-8"?>
<pd:ProcessDefinition xmlns:pd="http://xmlns.xyz.com/process/2003" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<pd:activity name="Invoke Request-Response Service">
<pd:type>RequestReplyActivity</pd:type>
<pd:resourceType>OpClientReqActivity</pd:resourceType>
<pd:x>300</pd:x>
<pd:y>80</pd:y>
</pd:activity>
</pd:ProcessDefinition>

and then in your find, you could do something like:

var nsMap map[string]string = map[string]striong{
  "q": "http://xmlns.xyz.com/process/2003",
  "r": "http://www.w3.org/1999/XSL/Transform",
  "s": "http://www.w3.org/2001/XMLSchema",
}
xmlquery.Find(doc, "//q:ProcessDefinition/q:activity/q:type", nsMap)

nathanclayton avatar Oct 28 '20 21:10 nathanclayton

+1 to what @nathanclayton proposed - this is what lxml is doing in Python.

as a work around, you can use namespace-uri() = '…' and local-name() = '…'

arthurdarcet avatar Feb 09 '21 12:02 arthurdarcet

Hi @arthurdarcet, thanks for the suggestion, would you explain how would using namespace-uri() and local-name() can simplify the following xpath query please?

"//pd:ProcessDefinition/pd:activity/pd:type"

Sorry I wasn't able to put 2 and 2 together myself. thanks

suntong avatar Mar 17 '21 12:03 suntong

@suntong of course: "//*[namespace-uri()='http://xmlns.xyz.com/process/2003' and local-name()='ProcessDefinition']/*[namespace-uri()='http://xmlns.xyz.com/process/2003' and local-name()='activity']/*[namespace-uri()='http://xmlns.xyz.com/process/2003' and local-name()='type']"

arthurdarcet avatar Mar 17 '21 16:03 arthurdarcet

  1. Thanks @arthurdarcet

  2. +1 to @nathanclayton's suggestion, which gives the perfect solution to the problem from #1:

when parsing an XML with namespaces we don't know what alias is used in the document but we know the actual namespace URI. So we can't hard-code a query like //myns:child; we should be able to set a the myns => http://..uri... alias at parsing time, and that should work even if that namespace was aliased in any other way in the input document.

Please consider.

suntong avatar Mar 18 '21 02:03 suntong

in the latest of xpath v1.2.4, using CompileWithNS() method to pass a prefix and namespaceURL map. It is like the below code:

expr, _ := xpath.CompileWithNS("//pd:x", map[string]string{"pd": "http://xmlns.xyz.com/process/2003"})
node := xmlquery.QuerySelector(doc, expr)

closed this issue.

zhengchun avatar Feb 15 '23 16:02 zhengchun