jsonquery icon indicating copy to clipboard operation
jsonquery copied to clipboard

Consistent Select/Query interface with xmlquery

Open Maldris opened this issue 2 years ago • 7 comments

In the sibling xmlquery package, the Node struct has public methods; SelectElement(name string) *Node and SelectElements(name string []*Node

jsonquery by comparison only has SelectElement.

Given both packages export the global function Find and FindOne both can obviously be implemented for JSON, as they already are.

I propose a change where jsonquery's SelectElement implementation mimics xmlquery's and internally calls FindOne, and similarly add SelectElements that internally calls Find.

To this end, both node types can be identified and used with an interface of

type Selector interface {
	InnerText() string
	SelectElement(name string) Selector
	SelectElements(name string) []Selector
}

Allowing basic manipulation of both file types to only differ during load.

The alternative to this being using Query and QueryAll to allow errors to be returned, instead of simply returning a nil result/panicking, or implementing both sets as methods in each package and allowing users to make the selection.

Is this proposal sound and within the intended scope of this project?
If so I will make the changes and a pull request.

Maldris avatar Mar 15 '22 09:03 Maldris

My main concern with this change is that it changes the behaviour of jsonquery's SelectElement which currently only looks at immediate children with a matching field name, instead of running a full query that can look at the full child tree as xmlquery does.

As a result, this would be a breaking change. However, it would mean a consistent interface, and more predictable behaviour compared to the other query and select functions which all run a full query.

Maldris avatar Mar 15 '22 09:03 Maldris

I forget why no SelectElements method. May be is JSON doesn't support Duplidate Keys.

the below is incorrect json structure.

{
	"aa": "sss",
	"aa": "sss"
}

In json the SelectElements is doesn't make sense, because we don't have same keys in the childs element. Let me know if I'm wrong. Thanks

zhengchun avatar Mar 16 '22 04:03 zhengchun

TLDR: while yes a query like //aa for your example json doesn't make sense, a query like //array/elements, //name or //cars//name on the example below do, and are already implemented at the package level. I'd like to add them as methods so that there is a common interface (and I am currently testing this on a fork).

The main issue with this being doing so changes SelectElement from only looking at siblings and immediate children to the whole subtree.


My main use case consideration was twofold;

A work project involves a large configuration driven file processor, historically it worked only with xml files, but now adding JSON files. Having an abstract interface for both would mean the logic remains the same, but with different loaders, making the whole application simpler than putting type switches everywhere.

To achieve this, the main case was queries like getting the elements of an array, or getting the presence of a keyword in sibling object properties.

Such as for the test json:

{
	"name":"John",
	"age":30,
	"cars": [
		{ "name":"Ford", "models":[ "Fiesta", "Focus", "Mustang" ] },
		{ "name":"BMW", "models":[ "320", "X3", "X5" ] },
		{ "name":"Fiat", "models":[ "500", "Panda" ] }
	]
}

running the query //cars//name to find all the car names, or //cars/element to get each array element so that the system can then range over those lists and run configured operations on each.

Given that Find/FindOne or Query/QueryAll are already implemented at the package level, and for xmlquery the methods just call them (passing the subject node as top) I'm unsure why the same pattern wasn't taken for this package? yes certain classes of query such as the example you gave aren't valid, but other ones like the above example are.

I've forked both packages and implemented a common interface, I'm currently looking at improving the test coverage in jsonquery, then I'll be doing a test in a real system to make sure it works as intended with real documents rather than simple unit tests.

jsonquery fork xmlquery fork

*Edited some typos

Maldris avatar Mar 16 '22 05:03 Maldris

You are right, I forgot the Array[] type in JSON. SelectElements can works for array type.

zhengchun avatar Mar 16 '22 10:03 zhengchun

I'll be doing some testing tomorrow with the forked packages, so I'll report back if its working as intended, then we can make decisions on how to handle the breaking change to SelectElement and whether to merge.

Maldris avatar Mar 16 '22 11:03 Maldris

ok, mixed bag results, I created the interface

type Selector interface {
    // data retrieval
    InnerText() string

    // structure query
    SelectElement(query string) Selector
    SelectElements(query string) []Selector
    Query(query string) (Selector, error)
    QueryAll(query string) ([]Selector, error)
    QuerySelector(query *xpath.Expr) Selector
    QuerySelectorAll(query *xpath.Expr) []Selector

    // ===================== //
    // currently not working //
    // ===================== //

    // signature differs between packages
    // OutputXML() string 
    // vs
    // OutputXML(self bool) string

    // only in jsonquery
    // ChildNodes() []*Node 
    // could be added to xmlquery easily

    // only in xmlquery
    // SelectAttr(name string) string
    // no analog in jsonquery
}

As you can tell by the comments there were a few issues, the different signatures and differing functions could be overcome without too much issue, but the main issue;

I'd forgotten that even if *Node satisfies the Selector interface []*Node will not match []Selector due to the way go handles slices and not making O(n) calls implicit (see a stack overflow answer that also links to a few examples in the comments). As a result, I'd need to change the return types of each function to return the interface, which is not ideal as it masks the methods not in the interface, as well as field properties of the struct.

At this point, proxying the interface with a wrapper type is probably the best choice for my application, as a naïve example:

func ConvertToSelector(doc *jsonquery.Node) common.Selector {
	return selectorImpl{node: doc}
}

type selectorImpl struct {
	node *jsonquery.Node
}

func (n selectorImpl) InnerText() string {
	return n.node.InnerText()
}

func (n selectorImpl) SelectElements(query string) []common.Selector {
	res := n.node.SelectElements(query)
	result := make([]common.Selector, len(res))
	for i := range res {
		result[i] = selectorImpl{node: res[i]}
	}
	return result
}

// ...

The new method versions of the selector functions still help with this, but this does detract somewhat from the motivation

My current thought is to at least merge the forks and add the methods as they still have utility and can help other users, and improves test coverage.
But moving towards having a consistent interface as part of the package would be a larger longer term change, if still desirable.

Any thoughts? otherwise I'll create the pull requests for the last pushed commits which only added the methods and additional test coverage and the discussion can move to specifically those changes in the pull request itself

Maldris avatar Mar 17 '22 01:03 Maldris

Thans for your suggest, but This change will breaking the previous xmlquery and jsonquery package. xmlquery and jsonquery are independent packages, and both are depend on xpath package.

zhengchun avatar Mar 17 '22 06:03 zhengchun