Precision loss in modeling of 80-bit FPU registers
Summary
Remill models some 80-bit x87 FPU registers with f64's. Per discussion with @pgoodman, this was a design choice to prioritize bitcode portability.
- What are the possible impacts of this modeling strategy?
- Should this be configurable?
- If configurable, should it be the default, or opt-in?
- If configurable, should it be so at runtime, or only statically?
Along with these concrete questions, this would also be a good chance to consider staking out explicit goals and non-goals wrt modeling approaches.
Discussion and examples
Internally, Remill already has types that could fully model 80-bit registers, e.g. here. These are currently used when modeling memory regions.
Right now, the lossy register modeling currently occurs in exactly one place: when modeling 80-bit FPU registers in the x86 architecture, here.
When thinking about typical numerical code (physical modeling, statistics), it is easy to argue that the lost precision will not necessarily have a meaningful impact on program semantics; the lifted computations should be roughly the same in the ways that we care about, but the actual runtime values will be coarser approximations.
One subtlety is that numeric algorithms may check for iterative convergence or divergence, and do so using very small or large constants. The lifted bitcode, with lossy register types, could then be observably different in terms of not just numerical approximation, but runtime control flow.
There are also cases where FP registers are used to optimize integer arithmetic. A famous case of this is Daniel J. Bernstein's reference implementation of Curve25519. Such optimization strategies depend on particular uint values in a certain range (acting as scalars in a finite field) and (critically) their arithmetic operations being exactly representable in a certain floating point data type. Error due to approximation in this circumstance would be catastrophic, unlike the seemingly "softer" failure mode of losing significant digits. I don't think any modern implementations of that particular curve would explicitly use registers r0 to r7, but the idea generalizes.
The register modeling above will also be used for the (in-progress) semantics of the FBLD and FBSTP instructions, which load/store 80-bit packed binary-coded decimal integers into 80-bit FP registers. In the Intel SDM (Vol. 1, 4.7), we have an explicit guarantee of the exact representability of any BCD integer in an 80-bit x87 register, and this guarantee does not hold for f64 registers. For example, packed BCD integers range over -1e18 + 1 to 1e18 - 1, but I believe the contiguous range of integers we can exactly represent all of as f64 is approximately -9e15 to 9e15.
Tying it all together, it looks like we model SSE/AVX registers with custom types that capture their full size. In that sense, I'm not sure what makes f80 modeling a special case. Whatever we think makes the most sense, it would be helpful to document when to make such tradeoffs, and to understand exactly what we win and lose by doing so. For example:
- What kind of round-tripping guarantees should Remill provide?
- What kind of divergent runtime behavior is acceptable in lifted code?
- Do we really want to permit lifted bitcode to functionally differ from original binary?