python-webencodings icon indicating copy to clipboard operation
python-webencodings copied to clipboard

Support getstate and setstate on IncrementalEncoder/Decoder

Open gsnedders opened this issue 12 years ago • 4 comments

Python 3 introduces a getstate/setstate method pair on the incremental encoder/decoders. It would be nice to expose this, even if only on Py3.

gsnedders avatar Dec 02 '13 14:12 gsnedders

This would be trivial for IncrementalEncoder and probably possible for IncrementalDecoder, but why is it useful?

SimonSapin avatar Dec 02 '13 15:12 SimonSapin

https://github.com/gsnedders/html5lib-python/commit/d214d0dc930fd62ac1cbe719d80b9fdcb92a50ae uses it for changing encoding while parsing, which is needed to be compliant with HTML. It's hard to quite get the right behaviour without it.

gsnedders avatar Dec 03 '13 00:12 gsnedders

webencodings.IncrementalDecoder looks for a BOM at the beginning of the input and picks the used encoding based on that. Does it make sense in html5lib’s context of changing encodings while parsing?

If you remove BOM stuff, webencodings.IncrementalDecoder(encoding, errors) is just a wrapper for encoding.codec_info.incrementaldecoder(errors), which does implement getstate/setstate.

html5lib could only use webencodings.lookup to get the right labels, ignore the rest of webencodings, and use Python’s APIs for the actual decoding.

SimonSapin avatar Dec 03 '13 01:12 SimonSapin

Yeah, I guess.

gsnedders avatar Dec 03 '13 16:12 gsnedders