stax2-api icon indicating copy to clipboard operation
stax2-api copied to clipboard

No Defaulting to UTF-8 in StartDocumentEventImpl#getCharacterEncodingScheme() without XML-Declaration

Open ghost opened this issue 7 years ago • 1 comments

A org.codehaus.stax2.ri.evt.StartDocumentEventImpl instance is constructed from the values in a XMLStreamReader instance.

The member variable mEncodingScheme is set from the reader-function #getCharacterEncodingScheme(). JavaDoc of that function:

Returns the character encoding declared on the xml declaration Returns null if none was declared

If there is no xml declaration in the document, mEncodingScheme will be null. In XML 1.0 a xml declaration SHOULD be present, but is not required.

The JavaDoc of javax.xml.stream.events.StartDocument, the interface which is implemented defines the return value of the #getCharacterEncodingScheme()-function as follows:

the character encoding, defaults to "UTF-8"

org.codehaus.stax2.ri.evt.StartDocumentEventImpl#getCharacterEncodingScheme() returns the memberVariable mEncodingScheme, which is null when the XML has no xml declaration. I would expect the default value UTF-8.

ghost avatar Sep 20 '18 08:09 ghost

I guess that would make sense, given there is also method encodingSet() allows for determining whether encoding was indicated by XML declaration. Then again, parsers may (actually even need to) detect encoding from source if encoding declaration is missing.

I wish Stax specification was bit more prescriptive, as javadoc wording is still quite vague.

Anyway: I will think about this, and if changed as suggested should probably go in 4.2 as it is behavior change but minor one (could be argued it is a bug, to include in patch too but I prefer minor version).

cowtowncoder avatar Sep 23 '18 18:09 cowtowncoder