JSON-java icon indicating copy to clipboard operation
JSON-java copied to clipboard

JSON API version: 20180813 converts empty xml elements to empty string in place of empty JSON Object

Open nayakbharat opened this issue 7 years ago • 15 comments

Recently, for a requirement I had to move to version:20180813 of JSON API. Earlier I was using version:20090211 of JSON API. For empty xml element, version:20090211 returns empty JSON Object ({}); but version:20180813 returns empty string (""). This change has broke my working application.

Here is chunk of code: org-json

Output with version: 20090211 version-20090211

Output with version: 20180813 version-20180813

nayakbharat avatar Oct 21 '18 17:10 nayakbharat

Sorry for not responding sooner. Can you identify where this change was introduced? If you want to propose a change, please check the FAQ

stleary avatar Oct 24 '18 13:10 stleary

Following code at the line #354 in the org.json.XML generates empty strings instead of empty objects:

                    // Empty tag <.../>
                    if (x.nextToken() != GT) {
                        throw x.syntaxError("Misshaped tag");
                    }
                    if (jsonobject.length() > 0) {
                        context.accumulate(tagName, jsonobject);
                    } else {
                        context.accumulate(tagName, ""); <--- Empty string is generated instead of the empty object if object length is 0.
                    }
                    return false;

pmolchanov2002 avatar Oct 25 '18 18:10 pmolchanov2002

Is it valid to return an empty string instead of an empty object? Based on the syntax and description on the http://json.org/, an object is not a string:

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.

pmolchanov2002 avatar Oct 25 '18 18:10 pmolchanov2002

Lookin at the git#blame for that block of code, it was last changed 8 years ago by the original author Douglas Crockford .

Given the age of the change, I'm not sure I want to switch it back at this point and possibly cause a breaking change for people that have been using newer versions of the library.

@stleary thoughts? blame output here :https://github.com/stleary/JSON-java/blame/1a811f1ada29098210cb6ec9e733d2648721ba57/XML.java#L356

johnjaylward avatar Oct 25 '18 18:10 johnjaylward

The empty object change was introduced in the 20131018 version without any comment why the change was made.

pmolchanov2002 avatar Oct 25 '18 18:10 pmolchanov2002

@pmolchanov2002 I'm guessing that the reason for the change is because our XML parser doesn't use any context, so it doesn't know what type any particular empty tag should be. Should that empty tag be an empty string, null, empty array, empty object? There are many choices, none of which are very good for a context free XML parser.

johnjaylward avatar Oct 25 '18 18:10 johnjaylward

@johnjaylward We faced the problem with this implementation.

Say, we have an empty result set object. It may not have any results and it will be returned in the XML as <resultSet/>.

However, for another query, result set may have results and it can be returned in XML as <resultSet><result></result></resultSet>.

The client expects result set with embeded objects, something like: resultSet: { result: {}}.

However, for the empty result set it gets empty string instead of the object, like: resultSet:"".

And client application needs to handle this case in if/else conditions.

pmolchanov2002 avatar Oct 25 '18 19:10 pmolchanov2002

Yeah, I'm not disagreeing that it would be best for your application to handle it that way. The problem is that it wouldn't be best for every application. For some applications an empty string may be the right choice. For other, maybe null would be the right choice. Others still may have had an empty array [] as a best choice.

johnjaylward avatar Oct 25 '18 19:10 johnjaylward

Does it break the definition of the object?

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

Even if the object is empty.

By the way, if an empty object has attributes, for example <resultSet name="test"/>, it's serialized as an object, not as an empty string: "resultSet":{"name":"test"}.

pmolchanov2002 avatar Oct 25 '18 20:10 pmolchanov2002

You are confusing 2 different concepts here. XML does not have objects. It is a document structure. An XML Element like can represent any number of things. It is not an object.

For your example of <result name = "test" />, that is not an empty XML Element. It has an attribute named name with a value of test. If it was serialized to JSON as an empty string that would be a problem.

however, the XML Element <result></result> or short-form <result /> is an "empty element". It is still not an empty object. A correct value of the "result" element could be a number (<result>5</result>) or a string (<result>this is a string</result>) or an array of other values:

<results>
<result>1</result>
<result>2</result>
<result>this is another result</result>
</results>

All of those are valid XML. When our XML parser sees just an empty Element like <result /> we have no idea what the data type is. The JSON definition of an object is irrelevant.

johnjaylward avatar Oct 25 '18 20:10 johnjaylward

From the spec (https://www.w3.org/TR/xml/#dt-eetag):

[Definition: Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications.] Each attribute specification has a name and a value.

The important part here is: Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications.

Actually, in my example <result name = "test" /> is an empty element with a single attribute name. Empty elements can have attributes by definition.

From the spec:

[Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.

An empty element is still an element, it's not a string or null or anything else.

pmolchanov2002 avatar Oct 25 '18 21:10 pmolchanov2002

But I perfectly understand that there may be no reason to change existing implementation as it can easily break dependent applications. Oh....

pmolchanov2002 avatar Oct 25 '18 21:10 pmolchanov2002

@johnjaylward thanks for tracking down the history. Given the risk of breaking existing applications, I think it is better not to make a change at this time.

stleary avatar Oct 26 '18 03:10 stleary

can we add a new configuration property to XMLParserConfiguration? like we have for "keepStrings"

alavrentik avatar Mar 07 '24 10:03 alavrentik

@alavrentik No objections if someone wants to try adding an opt-in XMLParserConfiguration flag.

stleary avatar Mar 07 '24 17:03 stleary