java-smt Fix Floating-Point Mantissa Not Including Sign Bit

While implementing PR 512 i found a problem in our implementation of FloatingPointType and number. Our returned significant precision is off by one, see SMTLib2 standard page for Floating-Points. E.g. for single precision FP, we return significant size 23. But the standard defines exponent and significant as:

eb defines the number of bits in the exponent;
sb defines the number of bits in the significand, *including* the hidden bit.

Hence it should be 24. As a consequence, we also return this wrongly for FormulaType method toSMTLIBString(). This is fixed in this PR. We also encode the significant without the hidden bit in toString(), and subsequently in fromString. While this is in general not as bad, i changed it to include the hidden bit now (in both), so that it is in-line with the standard. These behavior changes should be communicated clearly to users!

The solvers generally expect the hidden bit to be part of the significant, hence why we had 4 solvers with the code type.getMantissaSize() + 1, and only one solver without the + 1 (MathSAT5).

The general behavior of FP creation/solving is correct IFF you know that the mantissa does not include the sign bit.

Also, we don't communicate any information about our interpretation of FP precisions, hence the users might think that by building FormulaType.getFloatingPointType(8, 24) (first input is exponent, second is significant) they get a single precision FP-type, but in reality they get a total size of 33 bits currently. Our static implementation of the single/double precisions is fine however.

This PR adds/changes the following:

Add new constructors for FP type that specify the sign bit inclusion in the method name. Also, update the API for getMantissaSize() to getMantissaSizeWithHiddenBit() and getMantissaSizeWithoutHiddenBit().
Use the less ambiguous API in our internal implementation, reducing confusion about magic +-1
Including the hidden bit in the documentation of relevant API, as well as the parameter names.
Deprecate the old FP type constructor and mantissa size getters.
Return the correct SMTLIB2 interpretation for FloatingPointType.toSMTLIBString().
Change behavior of FloatingPointType.toString() and FloatingPointType.fromString() to include the hidden bit in the significant.
Deprecated public constants for FP single/double precisions, as they do not specify the in/exclusion of the hidden bit and will be moved to internal API. Also added new constants that are not public for mantissas that specify the hidden bit exclusion.
Adds tests for FP precision sizes, as well as FloatingPointType.toString() etc.

I feel like the API changes reduce the magic + 1 and - 1 by a large amount, increasing readability of the code. My guess is that users, especially new users, will benefit from that. The deprecation warning needs to be discussed/done.

Sep 02 '25 08:09 baierd

What is the actual effect of this MR (supposed to be)? Is it purely an API improvement that adds new methods and deprecates some old ones, or does it actually change the behavior in some cases? From your previous off-line explanations and from the title ("Fix") I would assume that there was actually broken behavior that is now fixed. But in the description of the MR I do not find anything related to this and it reads as if there is just new stuff and no behavior changes. Please make clear which is the case, and if no existing behavior is actually broken and needs to be changed, reflect this in the MR title.

Fix FP type etc. to include the sign bit, and switch to what the standard defines in our internal implementation. Only MathSAT5 expects the mantissa to not include the sign bit.

This wording doesn't really make sense. The FloatingPointType class itself (I assume this is what you mean) is not broken and does not need to be fixed, and the MR also does not attempt to fix it nor changes anything in it (except for toString()). What is broken and fixed is only that there are existing methods with unclear names and missing documentation.

Sep 02 '25 08:09 PhilippWendler

What is the actual effect of this MR (supposed to be)? Is it purely an API improvement that adds new methods and deprecates some old ones, or does it actually change the behavior in some cases?

The behavior is unchanged. It only improves the API, adds new methods, and improves code readability/maintainability.

The main problem is that the standard defines the mantissa to include the sign bit, but our API does not. 4/5 SMT solvers with FP support in JavaSMT handle it the same way. Only some parts of the related API are documented, hence users might assume that our API works like the standard or the majority of the solvers describe it. This PR aims to fix this ambiguity. I improved the wording above.

Sep 02 '25 08:09 baierd

For the record: I find it good that you decided to not add getMantissaSizeWithSignBit() and getExponentSize() to FloatingPointFormulaManager as it was done in #512. These methods are not necessary, encourage bad code (users should pass around FP size information as FloatingPointType instances instead of two ints), and their implementations contained redundant code that duplicated already existing code in JavaSMT (making it more error prone).

Sep 02 '25 09:09 PhilippWendler

@PhilippWendler we could deprecate the method fromString() and add a new method fromSMTLIBString(). This would reduce confusion about the change.

Sep 02 '25 16:09 baierd

we could deprecate the method fromString() and add a new method fromSMTLIBString(). This would reduce confusion about the change.

Would you then leave both toString() and fromString() unchanged such that they remain backwards compatible? Then it might make sense to push users towards fromSMTLIBString().

If not, I would not see much connection between fromString() and fromSMTLIBString(). These two methods would simply be independent methods that support different kinds of formats, fromSMTLIBString() can not be used as a replacement to fromString() if what you need is parsing the old type representation. Of course, one can question whether JavaSMT defining its own (but undocumented) type representation and providing a parser for it is a good idea or whether it should be deprecated, and one can also discuss whether adding a separate parser for SMTLib type strings is useful, but I would see these as mostly orthogonal to the current issue (except it if is used as argument for solving the compatibility problem of fromString()).

Sep 03 '25 05:09 PhilippWendler

we could deprecate the method fromString() and add a new method fromSMTLIBString(). This would reduce confusion about the change.

Would you then leave both toString() and fromString() unchanged such that they remain backwards compatible? Then it might make sense to push users towards fromSMTLIBString().

If not, I would not see much connection between fromString() and fromSMTLIBString(). These two methods would simply be independent methods that support different kinds of formats, fromSMTLIBString() can not be used as a replacement to fromString() if what you need is parsing the old type representation. Of course, one can question whether JavaSMT defining its own (but undocumented) type representation and providing a parser for it is a good idea or whether it should be deprecated, and one can also discuss whether adding a separate parser for SMTLib type strings is useful, but I would see these as mostly orthogonal to the current issue (except it if is used as argument for solving the compatibility problem of fromString()).

Well, good question(s). We need to change toString() in my opinion, as it's built-in. toSMTLIBString() should also be changed, as it does not match the SMTLIB2 specification. The method fromString() can be left unchanged and deprecated for the new method fromSMTLIBString(), which then correctly interprets the SMTLIB2 standard. This has the advantage that the names of the methods would match, and we communicate the definition directly, as SMTLIB is in the name.

We could simply deprecate fromString() and not replace it of course, but why not use the implementation that already exists? Especially since users might expect such a method as it already exists.

Sep 03 '25 14:09 baierd

Is there a misunderstanding maybe? fromString() does not do what fromSMTLIBString() is supposed to do. fromString() parses the output of toString(), not of toSMTLIBString(). So toString() and fromString() certainly need to match.

But there is no technical need to change toString() and fromString(), the (undocumented) type representation could simply be defined to not include the sign bit in the mantissa size. (Whereas for toSMTLIBString() anything that deviates from SMTLIB is clearly a bug that should be fixed.)

Sep 03 '25 14:09 PhilippWendler

While implementing PR 512 i found a problem in our implementation of FloatingPointType and number. Our returned significant precision is off by one. E.g. for single precision FP, we return significant size 23. But the standard defines exponent and significant as:
eb defines the number of bits in the exponent;
sb defines the number of bits in the significand, *including* the hidden bit.
Hence it should be 24.

Just for clarification: we're returning 23 because we don't include the hidden bit in the significand size. Here hidden bit refers to the first digit of the significand, which is dropped from the floating point representation. So, for instance, instead of 1.100111*2^-3 only .100111*2^-3 will be written out. This saves precious space and the hidden bit can always be restored by looking at the exponent.

Some of the new names here suggest to me that the sign bit was missing from the old size. This is a bit unfortunate, as the sign is never included in the size of the significand. It just happends to have the same size (= 1) as the hidden bit, so the math works out. However, these are really different concepts, and I think we should be careful to not add any more confusion for the users

Maybe we could still find some better names?

Sep 03 '25 20:09 daniel-raffler

Just for clarification: we're returning 23 because we don't include the hidden bit in the significand size. Here hidden bit refers to the first digit of the significand, which is dropped from the floating point representation. So, for instance, instead of 1.100111*2^-3 only .100111*2^-3 will be written out. This saves precious space and the hidden bit can always be restored by looking at the exponent.

Some of the new names here suggest to me that the sign bit was missing from the old size. This is a bit unfortunate, as the sign is never included in the size of the significand. It just happends to have the same size (= 1) as the hidden bit, so the math works out. However, these are really different concepts, and I think we should be careful to not add any more confusion for the users

Maybe we could still find some better names?

Ha, this is true. One should have really looked into the SMTLIB standard before doing all this work:

 :notes
 "eb defines the number of bits in the exponent;
  sb defines the number of bits in the significand, *including* the hidden bit.
 "

So all cases of "with(out)SignBit" need to be renamed to "with(out)HiddenBit". And for getTotalSize() we might need to rediscuss the name.

Sep 04 '25 05:09 PhilippWendler

too many open questions.

The whole PR never ever mentions once that the SMTLIB definition could be found online: https://smt-lib.org/theories-FloatingPoint.shtml

Sorry :sweat_smile:, i assumed you guys know. My bad.

Sep 07 '25 11:09 baierd

Just for clarification: we're returning 23 because we don't include the hidden bit in the significand size. Here hidden bit refers to the first digit of the significand, which is dropped from the floating point representation. So, for instance, instead of 1.100111*2^-3 only .100111*2^-3 will be written out. This saves precious space and the hidden bit can always be restored by looking at the exponent. Some of the new names here suggest to me that the sign bit was missing from the old size. This is a bit unfortunate, as the sign is never included in the size of the significand. It just happends to have the same size (= 1) as the hidden bit, so the math works out. However, these are really different concepts, and I think we should be careful to not add any more confusion for the users Maybe we could still find some better names?

Ha, this is true. One should have really looked into the SMTLIB standard before doing all this work:
 :notes
 "eb defines the number of bits in the exponent;
  sb defines the number of bits in the significand, *including* the hidden bit.
 " 
So all cases of "with(out)SignBit" need to be renamed to "with(out)HiddenBit". And for getTotalSize() we might need to rediscuss the name.

I renamed all API referencing the sign bit to reference the hidden bit instead.

The total size is easier in my opinion, as the total size always includes the hidden bit.

Sep 12 '25 10:09 baierd

too many open questions.

The whole PR never ever mentions once that the SMTLIB definition could be found online: https://smt-lib.org/theories-FloatingPoint.shtml

I updated the PR text with all the changes and added the reference.

Sep 12 '25 10:09 baierd

And for getTotalSize() we might need to rediscuss the name.

The total size is easier in my opinion, as the total size always includes the hidden bit.

No, it doesn't. If we want getTotalSize() to return 32 for single precision, then the total size includes exponent size + mantissa size without hidden bit + sign bit (the current code for this is wrong because it misses the sign bit, it just happens to compute the same result).

Sep 12 '25 10:09 PhilippWendler

And for getTotalSize() we might need to rediscuss the name.

The total size is easier in my opinion, as the total size always includes the hidden bit.

No, it doesn't. If we want getTotalSize() to return 32 for single precision, then the total size includes exponent size + mantissa size without hidden bit + sign bit (the current code for this is wrong because it misses the sign bit, it just happens to compute the same result).

Fair point, the description was wrong! I fixed the JavaDoc in both getTotalSize() calls and made it more explicit in the code. I need to check whether we have more incorrect references to the components of the total size though.

Still, i would argue that we need to rename getTotalSize() because of this?

Sep 12 '25 10:09 baierd

Still, i would argue that we need to rename getTotalSize() because of this?

If all other methods that do not include the hidden bit are called ...WithoutHiddenBit, and then we have a method called getTotalSize(), wouldn't you assume that "total" includes everything, i.e., also the hidden bit? At least I thought it should be considered.

Sep 12 '25 11:09 PhilippWendler

Still, i would argue that we need to rename getTotalSize() because of this?

If all other methods that do not include the hidden bit are called ...WithoutHiddenBit, and then we have a method called getTotalSize(), wouldn't you assume that "total" includes everything, i.e., also the hidden bit? At least I thought it should be considered.

That is a fair point! @kfriedberger what do you think?

Sep 12 '25 17:09 baierd

I agree with Philipp and would also assume that totalSize covers all three components, i.e., exponent, mantissa, and sign bit.

Sep 12 '25 18:09 kfriedberger

I agree with Philipp and would also assume that totalSize covers all three components, i.e., exponent, mantissa, and sign bit.

I see it the same way, but would you rename getTotalSize()?

Sep 12 '25 19:09 baierd

I agree with Philipp and would also assume that totalSize covers all three components, i.e., exponent, mantissa, and sign bit.

Nobody has argued against this. The current question is what should the relation between getTotalSize() and the hidden mantissa bit be, in particular whether the method should be renamed because it does count in the hidden bit.

Sep 15 '25 06:09 PhilippWendler

I agree with Philipp and would also assume that totalSize covers all three components, i.e., exponent, mantissa, and sign bit.

Nobody has argued against this. The current question is what should the relation between getTotalSize() and the hidden mantissa bit be, in particular whether the method should be renamed because it does count in the hidden bit.

A user might prefer shorter significant names, ... so getTotalSize() is sufficient and does not need a renaming.

Sep 20 '25 09:09 kfriedberger

A user might prefer shorter significant names, ... so getTotalSize() is sufficient and does not need a renaming.

A shorter name will not help users that accidentally pick the wrong method to call due to names that do not clearly describe the semantics. Also cf. https://github.com/sosy-lab/java-smt/pull/513#discussion_r2320891097

Oct 10 '25 13:10 PhilippWendler

A user might prefer shorter significant names, ... so getTotalSize() is sufficient and does not need a renaming.

A shorter name will not help users that accidentally pick the wrong method to call due to names that do not clearly describe the semantics. Also cf. #513 (comment)

I feel like total, as in "total as defined by the SMTLIB2 standard" should be be sufficient. I will take a closer look at the JavaDoc for this.

Oct 29 '25 17:10 baierd

Summing up everything and putting it into context with the sources:

SMTLIB2 uses the IEEE 754-2008 FP standard , limited to the binary case, but with: A major extension over 754-2008 is that the theory has a sort for every possible exponent and significand length..
The IEEE standard defines the bit format the following (i took this from the standard, page 9, 3.4 Binary interchange format encodings, linked above. Not the Wiki article!):

In binary format, IEEE 754-2008 FP numbers are encoded in k bits as follows:
- 1-bit sign S
- w-bit biased exponent E = e+bias
- (t = p-1)-bit trailing significant field digit string; the leading bit of the significand (p) is implicitly encoded in the biased exponent E

I am at the point where i would like to ask a SMTLIB2 person about why the standard does not adhere to this in its text! Because i would argue that the correct way of interpreting the IEEE standard in regards to the "size" is: "sign bit + exponent + significand - hidden bit". SMTLIB2 uses 'An Automatable Formal Semantics for IEEE-754 Floating-Point Arithmetic' as its source for definitions for FP values, and states:


 :values "For all m,n > 1, the values of sort (_ FloatingPoint m n) are 
  - (_ +oo m n)  
  - (_ -oo m n)
  - (_ NaN m n)
  - all terms of the form (fp c1 c2 c3) where
    - c1 is the binary #b0 or #b1
    - c2 is a binary of size m other than #b1...1 (all 1s)
    - c3 is a binary of size n-1

  The set of values for RoundingMode is {RNE, RNA, RTP, RTN, RTZ}.

The primary source of the SMTLIB2 standard also uses this definition. I would argue that the "total size" of a precision is therefore sign + mantissa + significand (without hidden bit). But the SMTLIB2 page also states that:

 ; Floating point sort, indexed by the length of the exponent and significand
 ; components of the number
 :sorts-description "All nullary sort symbols of the form 
 
    (_ FloatingPoint eb sb),

  where eb and sb are numerals greater than 1."
  
 :notes
 "eb defines the number of bits in the exponent;
  sb defines the number of bits in the significand, *including* the hidden bit.
 "

This + all the definitions on the SMTLIB2 FP page seem like this is just a shortcut to avoid writing the sign bit additionally (as in sign + mantissa + significand (without hidden bit) == mantissa + significand (with hidden bit)). But i really think the standard shot itself in the foot with the way they are not clarifying this here!

There is also more than single and double precisions; namely half (16bit) and quad (128bit). (From Wikipedia: The binary interchange formats have the "[half precision](https://en.wikipedia.org/wiki/Half_precision)" (16-bit storage format) and "[quad precision](https://en.wikipedia.org/wiki/Quad_precision)" (128-bit format) added, together with generalized formulae for some wider formats; the basic formats have 32-bit, 64-bit, and 128-bit encodings.). SMTLIB2 also mentions these, so we should also add them!
There is no "significant" in the FP format, but a "significand", with d at the end. ;D

Oct 31 '25 18:10 baierd