Get length of compressed stream so far without closing stream?
I would like to get the length of the compressed stream up to now but without closing the stream or affecting continued compression. I understand most Codecs might not support this, given their internal block lengths etc, but maybe there are ways to get related/close to this behavior?
The use case is something like this:
- We have a very long string/stream which has been compressed already, C(s_long)
- We now have a set of N shorter strings S_shorts = [s1, ..., sN] and we want to calculate map(length, [C(s_long * s1), ..., C(s_long * sN)]) but without having to redo the whole C(s_long) compression for each of the shorter strings si (since calculating C(s_long) might be costly in time).
- Note that we only need the lengths of all the C(s_long * si), not their actual bytes.
Any ideas how this can be done as fast as possible? :)
Currently I basically do a Huffman coding/tree or dictionary-based compression by hand and can thus save the intermediate tree/dictionary between each consideration of the short strings, but it would be nice if there is a way to use more advanced compressors like the CodecX ones in the TranscodingStreams framework.
This seems similar to https://stackoverflow.com/questions/11662745/how-can-one-copy-the-internal-state-of-zlib-compressor-object-in-python
I think a potential solution would be to add deepcopy support for Codecs.
Yes, deepcopy would really solve this. Not sure it's very performant (which is crucial in my case) but worth to try if there is a general use case for supporting deepcopy (at least for some codecs).