woodstox
woodstox copied to clipboard
Support InvalidCharHandler for reading
It looks like InvalidCharHandler
can only be set when writing, but not reading. Do you think it makes sense to support this for reading as well? If a user is dealing with a Reader
they could do this sort of transformation pretty easily before passing the stream to woodstox. However, it requires their code to understand which characters are invalid (rather than having woodstox be the source of truth for that). And if the user is dealing with an InputStream
, they may not have an easy way to do character-based filtering/replacement
I am bit hesitant about trying to support fully configurable approach, given complexity of XML character validity rules. But maybe something to fully disable validity checks for, say, textual content, would be ok -- because if so, user could provide custom InputStream
(or, more likely, Reader
) to implement validation they want and then Woodstox would just take whatever it gets.
To me it seems that validation at Reader
is probably way easier to layer than try to make decoder have validation calls.
I probably won't have time to work on this on my own, either way. But if anyone wants to create a PR that does not add measurable overhead for the default case, I'd of course be happy to help sanity check it & help get merged if and when it makes sense.
Sounds good, thanks for the quick reply
No problem.
Also, now that I think about this -- aside from the question of performance, I don't think I am against InvalidCharHandler
on per-character basis. If someone has time to implement it (I don't, but I always do my best to find time to review contributions).
It should be possible to hide the complexity behind error reporting functionality; I assume return value could be the character to use and so on.