woodstox icon indicating copy to clipboard operation
woodstox copied to clipboard

Support InvalidCharHandler for reading

Open jhaber opened this issue 2 years ago • 3 comments

It looks like InvalidCharHandler can only be set when writing, but not reading. Do you think it makes sense to support this for reading as well? If a user is dealing with a Reader they could do this sort of transformation pretty easily before passing the stream to woodstox. However, it requires their code to understand which characters are invalid (rather than having woodstox be the source of truth for that). And if the user is dealing with an InputStream, they may not have an easy way to do character-based filtering/replacement

jhaber avatar Apr 07 '22 14:04 jhaber

I am bit hesitant about trying to support fully configurable approach, given complexity of XML character validity rules. But maybe something to fully disable validity checks for, say, textual content, would be ok -- because if so, user could provide custom InputStream (or, more likely, Reader) to implement validation they want and then Woodstox would just take whatever it gets. To me it seems that validation at Reader is probably way easier to layer than try to make decoder have validation calls.

I probably won't have time to work on this on my own, either way. But if anyone wants to create a PR that does not add measurable overhead for the default case, I'd of course be happy to help sanity check it & help get merged if and when it makes sense.

cowtowncoder avatar Apr 07 '22 15:04 cowtowncoder

Sounds good, thanks for the quick reply

jhaber avatar Apr 07 '22 15:04 jhaber

No problem.

Also, now that I think about this -- aside from the question of performance, I don't think I am against InvalidCharHandler on per-character basis. If someone has time to implement it (I don't, but I always do my best to find time to review contributions). It should be possible to hide the complexity behind error reporting functionality; I assume return value could be the character to use and so on.

cowtowncoder avatar Apr 07 '22 22:04 cowtowncoder