arcode-rs
arcode-rs copied to clipboard
How to select appropriate number of bits for precision?
How do you select an appropriate number of bits to use for precision?
Related stack overflow question - https://stackoverflow.com/questions/71509576/what-is-the-optimum-precision-to-use-in-an-arithmetic-encoder
from my own reading and experimentation, i'm finding that-
- the precision bits must be at least 2 greater than the number of bits needed to represent that probability of each symbol
- using more precision bits results in a better compression ratio
- the gains are very minimal and saturate quickly
- the total number of bits is constrained by the width of the type used (say
u32
oru64
)
haven't investigated performance implications.
Check out the “Constraints on the Implementation” section from the paper referenced in the read me. It discusses choosing values and limitations of precision.
https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf
Thank you for the reference! Unfortunately that article doesn't add much that's new-
- the precision bits must be at least 2 greater than the number of bits needed to represent that probability of each symbol
- more bits is better
- there are some 'special' numbers which lend themselves to faster computation (ie 16)
still doesn't say anything about the rate of trade-off between precision and other factors. For example, why not use u64
instead of u32
? or even u128
? In what regimes do the trade-offs matter?
For my own implementation i've chose to set the precision automatically (to the maximum value), and panic if there's not enough bits to guarantee no overflow/underflow. Considering also making the integer representation a generic, so you could effectively choose u32
or u64
.
I'm also interested in experimenting to see whether this can be checked at compile time using const generics... (hopefully without polluting the API too badly)
see https://github.com/danieleades/arithmetic-coding/pull/7/files#diff-9a5ef4cd6ac7a66abd842b474ee88447516ed3bd218748eb6f387a0895d14fcb
Hey sorry I went so absent on this project have not been keeping up. I just want to drop a link to this https://github.com/Cyan4973/FiniteStateEntropy If you are still interested in lossless probability symbol coders. Very very fast and the ANS version is as good as AC but much faster. AFAIK lacks updating symbol table per encode?