arb_get_str() & arb_set_str() does not preserve value
Consider the following C program.
$ cat roundtrip.c
#include <arb.h>
int main() {
arb_t a;
arb_init(a);
arb_load_str(a, "2bb573849d73 -2c 800031b 365d");
arb_printn(a, 1024, ARB_STR_MORE);
arb_set_str(a, arb_get_str(a, 1024, ARB_STR_MORE), 1024);
arb_printn(a, 1024, ARB_STR_MORE);
}
When run (with Arb 2.22.1 from conda-forge) it prints:
[+/- 3.65e+4197][+/- 3.66e+4197]
I understand that arb_get_str() can introduce imprecision and I also understand that I should probably use arb_dump_str to serialize an arb_t; but shouldn't arb_set_str() preserve the radius?
Neither arb_get_str nor arb_set_str says in their documentation that they preserve the radius. It only states that the old interval is contained in the new interval.
@albinahlback Thanks, I am aware of that. I am not saying that the implementation breaks the documentation.
Then what is your issue? You asked if arb_set_str preserves the radius, for which I gave an answer.
My question was: Shouldn't arb_set_str() try to preserve the radius here?
Note that this is somewhat related to flintlib/arb#391. In flintlib/arb#391 arb_get_str introduces an error, here arb_set_str introduces an error.
I wonder if both methods could be more careful when it's possible not to introduce any error.
I have been trying to get this right for a while, testing various approaches and they all end like this:
using Serialization, ArbNumerics using Serialization: serialize_type using ArbNumerics: @libarb, @libflint
function Serialization.serialize(s::AbstractSerializer, x::ArbFloat) Serialization.serialize_type(s, typeof(x)) unsafestr = ccall(@libarb(arf_dump_str), Cstring, (Ref{ArbFloat},), x) len = Int64(ccall(:strlen, Csize_t, (Cstring,), unsafestr)) + 1 write(s.io, len) unsafe_write(s.io, pointer(unsafestr), len) ccall(@libflint(flint_free), Cvoid, (Cstring,), unsafestr) return nothing end
function Serialization.deserialize(s::AbstractSerializer, ::Type{ArbFloat{P}}) where {P} len = read(s.io, Int64)::Int64 str = Serialization.deserialize_string(s, Int(len)) # TODO: call arb_load_str to constuct arb_t from str x = ArbFloat{P}(0) ccall(@libarb(arf_load_str), Int, (Ref{ArbFloat{P}}, Ref{Cstring}), x, str) return x end
low_precision = 53 high_precision = 320
setworkingprecision(low_precision)
arf_val = log(ArbFloat(17))
2.83321334405621608
serialize_buffer = Serializer(IOBuffer())
Serializer{IOBuffer}(IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false,
size=0, maxsize=Inf, ptr=1, mark=-1), 0, IdDict{Any, Any}(),
Int64[], Dict{UInt64, Any}(), 26)
serialize(serialize_buffer, arf_val)
Serializer{IOBuffer}(IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false,
size=82, maxsize=Inf, ptr=83, mark=-1), 3, IdDict{Any, Any}(:ArbNumerics => 2, ArbFloat{64} => 0, :ArbFloat => 1),
Int64[], Dict{UInt64, Any}(), 26)
arf_val_recovered = deserialize(serialize_buffer, typeof(arf_val)) ERROR: EOFError: read end of file
@fredrik-johansson can you transfer this issue to FLINT?
@JeffreySarnoff can you provide a MWE in C for your issue? Also, please use code blocks to format the code.
@saraedum So the issue is that when you use string methods, you convert between a binary and a decimal representation on an inexact ring. This forces a loss of information.
@saraedum So the issue is that when you use string methods, you convert between a binary and a decimal representation on an inexact ring. This forces a loss of information.
It doesn't force a loss of information, though. It is true that you can't exactly represent one with the other, but with the additional context of the precision you can still keep all the same information. In particular, with ARB_STR_MORE the information in the midpoint is already adequately preserved; preserving the radius is what's at issue here.
Here is an example showing one of the obstacles here. arb_set_str doesn't set the correctly rounded result, so (by necessity) it has to inflate the error bounds when setting. It is usually impossible to set a number with an exact bound with this interface.
#include <stdio.h>
#include <stdlib.h>
#include "flint/arb.h"
int main() {
arb_t x, x2;
double y;
arb_init(x);
arb_init(x2);
arb_set_str(x, "1.1", 53);
arb_set_str(x2, "1.1", 100);
y = strtod("1.1", NULL);
printf("Value: ");
arb_printn(x, 20, ARB_STR_MORE);
printf("\nTrue precision: %g\nDouble value: %.20g\nArb conversion error: ", mag_get_d(arb_radref(x)), y);
mag_set_ui(arb_radref(x), 0);
arb_sub(x, x2, x, 100);
arb_printn(x, 7, ARB_STR_NO_RADIUS);
printf("\nDouble conversion error: ");
arb_set_d(x, y);
arb_sub(x, x2, x, 100);
arb_printn(x, 7, ARB_STR_NO_RADIUS);
printf("\n");
arb_clear(x);
arb_clear(x2);
return 0;
}
Value: [1.0999999999999998668 +/- 2.23e-16]
True precision: 2.22045e-16
Double value: 1.1000000000000000888
Arb conversion error: 1.332268e-16
Double conversion error: -8.881784e-17
It's impossible to support all of these simultaneously:
- The internal representation is binary
- The printed representation is decimal
- Decimal to binary conversion preserves intervals
- Binary to decimal conversion preserves intervals
- Binary to decimal conversion has bounded output precision
- Roundtrip is exact
If you want exact roundtrips, you have to give up one of the others. For example, you can make decimal to binary conversion not preserve intervals so that arb("2.22045e-16") gives something like $4503607640442835 / 2^{104}$ (or 2.2204499999999999214662505469145610272812811527946841305691805246169678866863250732421875e-16). But this is going to bite users who want to do interval arithmetic.
The best way to get exact roundtrips currently is to use arb_load_str / arb_dump_str.
We ought to add hex support to arb_get_str and arb_set_str as a more accessible option.
Yes, it is impossible to completely square the circle. There are some improvements that can be made generically, but IMO the best thing is to provide more choice, particularly to avoid breaking existing users. A proposal I was thinking of working on was the flag ARB_STR_EXACT (possibly a combination of ARB_STR_MIDPOINT_EXACT and ARB_STR_RADIUS_EXACT, also better naming suggestions welcome) that outputs the exact radius (to the number of digits required to reconstruct it) and also raises the limit on the number of midpoint digits so that it also has enough precision to reconstruct. This would be similar to printd, except it would be usable in more contexts (sometimes you want a string, not fileio). I think that would still preserve intervals when round-tripping through set_str, at least when done at the same precision, but the inherent decimal value would not be interval preserving.