XaoS icon indicating copy to clipboard operation
XaoS copied to clipboard

Support for arbitrary precision

Open jblang opened this issue 10 years ago • 28 comments

From [email protected] on May 05, 2009 09:56:28

XaoS currently can't support very deep zooms because the 80-bit floating point arithmetic used runs out of precision before too long. It should be possible to add arbitrary precision to XaoS using the MPFR library ( https://www.mpfr.org/). I am not sure if it would fast enough to support real-time zooming. There's probably no way to know without implementing it and running some benchmarks.

Original issue: http://code.google.com/p/gnuxaos/issues/detail?id=25

jblang avatar Nov 02 '13 19:11 jblang

It's not arbitrary precision, but gcc's builtin __float128 type will allow XaoS to zoom in 10^32 instead of 10^16 times. All that has to be done is change FPOINT_TYPE to __float128 in config.h. I've tested this, and XaoS works correctly. Unfortunately it's considerably slower than long double, and causes the frame rendering to fall behind, so it's not something we would want to do always. We could detect when we've zoomed past the precision of long double and automatically switch to __float128. This would require compiling a second version of all the functions that use long double under a different name, then call the appropriate functions based on the zoom depth. An easier short-term option is to compile two XaoS binaries and distribute both of them. Then the user can use the 128-bit version if they want to do deep zooms and don't mind the slow down.

Given that __float128 is too slow to do real-time zooming comfortably on a modern computer, I am sure that MPFR would be even slower, so it would not be useful at all. However, one approach that might make arbitrary precision viable is Perturbation Theory. It only requires calculating the orbits of a single point in arbitrary precision, then the rest of the points in the image can be calculated using standard hardware floating point precision, relative to this reference point. This is used in several fractal programs such as Kalles Fraktaler 2 and SuperFractalThing. I'm not sure how readily adaptable this approach would be for XaoS's zooming algorthim, but it's worthy of some investigation.

jblang avatar Apr 14 '19 04:04 jblang

Now that XaoS code has been converted to C++, it should be easy to implement arbitrary precision arithmetic by utilizing operator overloading on the MPFR library. We may want to implement staged increases in precision so we use long double by default, then switch to GCC's __float128 type automatically when we exceed 80 bits of precision, and then again to MPFR when we exceed 128 bits of precision.

jblang avatar Jan 06 '20 03:01 jblang

Who need binaries? I build 3.6 with __float128 and lemme tell ya, the Mandelbrot set just keeps getting weirder down there!

Sadly, it seems like the simple hack to use __float128 which was fine in 3.6 no longer works ... could this be a C++ porting artifact? :/

src/engine/formulas.cpp:254:45: error: call of overloaded ‘sin(number_t)’ is ambiguous
  254 |             w = (int)((sin((double)zim / zre)) * 127);
... and later ...
src/engine/formulas.cpp:1096:19: error: call of overloaded ‘fabs(number_t&)’ is ambiguous
 1096 |     pim = fabs(pim);                                                           \

trathborne avatar Jan 06 '20 12:01 trathborne

Oh, and don't forget to format __float128 with %Q ... I posted some deep zooms to fractalforums and we found that %ld wasn't printing enough precision!

trathborne avatar Jan 06 '20 12:01 trathborne

Hmm, I tried __float128 some time back but haven't tried it since converting the files to C++. I will try that and see if I can find the problem. Also, thanks for the pointer on formatting the output. I will change that when I make __float128 an official option.

You seem technically minded. I would love to have some more people working on XaoS. Pull requests are welcome! :)

jblang avatar Jan 06 '20 15:01 jblang

I fixed the compilation with __float128. The problem is C++ is more strict about type checking and I was using the old C library headers instead of the C++ equivalents, especially math.h. After switching to that, and adding one ifdef to fix isnan for quadmath, it compiles cleanly. I also updated the Qt project file to automatically add -lquadmath when enabled. All you need to do to enable 128-bit precision now is run qmake DEFINES+=USE_FLOAT128

Note that I have not (yet) fixed that printf formatting for __float128, and I also believe there is some other precision issue where zooming in past a certain level causes the center point of the fractal to change. It will no longer zoom into where you're pointing, and the viewport instead will start sliding toward the lower left. I will try to fix these later but official float128 support is not a priority for 4.0.

jblang avatar Jan 07 '20 02:01 jblang

@trathborne __float128 I/O support is now committed. I am not getting the weird zoom issues where the center changes on Linux; only saw them on Windows.

jblang avatar Jan 07 '20 05:01 jblang

Hmm, now I can't reproduce the zooming issues on Windows either... not sure what was going on earlier. Anyway, I guess 128-bit floating point can at least be considered an official beta feature now.

Unfortunately there's not an easy way to compile a binary to support both, but maybe we can distribute two binaries.

jblang avatar Jan 07 '20 05:01 jblang

You could say I am technically minded. I have a day job and a family but will help out when I can! ☺

Thanks for sorting out __float128! Quad can apparently have up to ~34 decimals of precision so %20Q won't actually preserve the location precisely. It compiled nicely for me but I haven't tested it yet.

This idea of gradually increasing precision is cool. Seems to me like there could be one binary, with all of the single/double/quad/GSL (ooh, and OpenCL/CUDA right?) worlds behind a bunch of function pointers which could be switched at runtime as the user zooms in further... instead of #ifdef. A bit of overhead for a lot of flexibility!

trathborne avatar Jan 07 '20 12:01 trathborne

Thanks, I understand you don't have unlimited time. Even small contributions are welcome. I've gone years at a time without touching XaoS myself. Right now I'm on a roll though :)

It's not technically impossible for them to be in one binary; it would just take a lot of effort with the way the code is right now. number_t is used all over the place in lots of different functions, files, and data structures, and there's no easy way to compile multiple versions of those in the same executable without lots of name clashes. Now that everything is C++, I can eventually templatize everything so it will be possible to instantiate versions for different data types. That will take time and effort though. There is still a lot of refactoring to do.

That's a good point about the decimal precision. I didn't think that far ahead, just used the format string that was already used in the file saving code. I will change it.

jblang avatar Jan 07 '20 14:01 jblang

Saving quads has been fixed to use 34 digits of precision in the latest commit.

jblang avatar Jan 19 '20 17:01 jblang

Fantastic! Thanks!

trathborne avatar Jan 21 '20 10:01 trathborne

This is working great, but now I want to use single precision so I can know how far to zoom when collecting coordinates for another tool which only has single precision. Also single precision is ridiculously fast which is kinda fun.

Single precision in XaoS is broken in two ways:

  1. In XaoS.pro, if USE_FLOAT128 is not defined, we get USE_LONG_DOUBLE. So I could not "DEFINES-=USE_LONG_DOUBLE" to turn it off. I agree that USE_LONG_DOUBLE is a sensible default and it's not clear to me how to enable the -= behaviour.
  2. in src/ui-hlp/save.cpp, lines 122 and 123 need snprintf, not sprintf.

Sorry for not making a pull request; I am not clear what to do in case 1 (for now I just commented out the whole block) and in case 2 it is just 2 insertions.

trathborne avatar Jan 24 '20 09:01 trathborne

Just commenting it out in the project file is probably your best bet for now.

By the way, when you comment out USE_LONG_DOUBLE, it uses double precision, not single. Single precision would be a 32-bit float not a 64-bit double. If you want to try single precision, you can change the number_t typedef in config.h, but I don't think it's something we'd generally want to include in XaoS.

For intel processors, I don't think 32-bit float (single) or 64-bit double would be any faster than 80-bit long double because the X87 instructions always use 80 bits internally and the result is just truncated to the desired precision. The only way it might be faster is if GCC is auto-vectorizing the code to use SSE instructions for doubles instead of x87 instructions, but I doubt that is the case. And on ARM, I think long double and double are equivalent (64 bit).

jblang avatar Feb 01 '20 16:02 jblang

Oh, that's a good point ... I wasn't even getting the single precision I was looking for. I need it because MathMap is hardcoded 'float' everywhere and I'd like to use XaoS to preview how far I can zoom. Of course ... I can just check the zoom level for this, too.

Anyways I agree that it doesn't make sense to push lower precision into XaoS.

trathborne avatar Mar 25 '20 10:03 trathborne

These days we have 256-bit multimedia registers, unlike the old 80-bit restrictions. Zooming could go a lot deeper with assembler code. Anyone familiar with the 256-bit instruction set?

hwpfeil avatar Mar 25 '20 21:03 hwpfeil

That won't work unfortunately. SIMD registers are not general-purpose floating point registers. True, they are 256-bit or even 512-bit, but they are used to pack multiple 64-bit or 32-bit numbers into a single register that can be operated on simultaneously and are not able to store a single number with precision above 64-bit.

jblang avatar Mar 25 '20 22:03 jblang

Indeed, it seems that there is no SIMD that supports anything beyond 64-bit float. Even the gcc "__float128" type is apparently implemented in software using 2 long_doubles.

For more speed, but still only 64 bits of precision, we could delegate the calculations to the GPU: (and come to think of it, also store all the results on the GPU, and port the drawing algorithm to a shader so it never hits the CPU...)

  • https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/scalarDataTypes.html
  • https://docs.nvidia.com/cuda/floating-point/ ← notes about "FMA" are interesting here

For more precision, we could use the Fractint trick with SIMD integer instructions. Since abs(everything) < 4 in typical Mandelbrot calculations, the point could be truly fixed.

  • https://en.wikipedia.org/wiki/Fixed-point_arithmetic
  • https://www.felixcloutier.com/x86/pmuludq
  • https://stackoverflow.com/questions/41403718/can-i-use-the-avx-fma-units-to-do-bit-exact-52-bit-integer-multiplications
  • https://locklessinc.com/articles/256bit_arithmetic/

Of course ... all of this would ruin the lovely model of XaoS where you just define number_t and recompile. ☺

trathborne avatar Mar 26 '20 08:03 trathborne

GPU support would require a near-total rewrite, I think. Also, with a GPU you don't really need the XaoS zooming algorithm to do real-time zooming; you could just brute force it. And since the GPU still can't do more than 64-bit precision, it doesn't gain much over using the CPU, since a modern CPU is plenty fast enough to zoom a full screen 1440p or even 4K image with the XaoS zooming algorithm.

Fixed point integer math might be fast enough but it would not be general and would require a lot of manual modification to every fractal formula, I think. And it's still ultimately limited to some fixed precision. I'm convinced the correct approach is to use an arbitrary precision library + perturbation theory, but I haven't had time to figure out how to do it yet.

jblang avatar Mar 26 '20 14:03 jblang

Proper GPU support would indeed require a near-total rewrite, to the point of it being a new project. In fact I can't imagine XaoS going much further, since all advancements (uh, nebulous ideas in my head) are rewrite-level changes.

My almost-modern CPU with 4 cores (Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz) is definitely not enough to sustain a 4k zoom without visible pixellation. That's with antialiasing so it's actually an 8k image. So GPU at 64-bit would still be worth it for me. It would be nice to see XaoS zoom renderer working on the GPU, though its heavy use of malloc() might make it hard to port.

Reading the XaoS docs, I notice that any out-of-view data is immediately discarded, which was a valid design choice back when RAM was limited, but I have 64GB of RAM and I'd love to be able to zoom in and out without recalculating. Maybe that discard threshold could be a user-tunable parameter? I'll look into that next time I have to stare at code.

Maybe the __float128 magic or an arbitrary precision system could work on the GPU ... but that should be a separate, foundational system, rather than a XaoS feature. Maybe there is an arbitrary precision library for CPUs which would let all of the existing code compile through the magic of C++.

Inigo Quilez has a good demo of perturbation theory at https://www.shadertoy.com/view/ttVSDW but its results are not perfect.

trathborne avatar Mar 26 '20 15:03 trathborne

I started working with @kovzol on this issue in hope to support both long double and __float128 precision without recompiling the code (See this branch for code changes https://github.com/kanurag94/XaoS/tree/float128). The concept is to change the number_t datatype to a struct that has two entries: one for long double and one for __float128. So the compiler can handle all of this in advance. Since the codebase has been changed to cpp, we were able to take advantage of operator overloading to avoid rewrite of many formulas especially inside (formulas.cpp) file. Have a look on the file https://github.com/kanurag94/XaoS/blob/float128/src/include/number_t.h that implements this concept. The file https://github.com/kanurag94/XaoS/blob/float128/src/include/number_t.cpp is able to shift currently used data-type as 0/1 for float128, and long double respectively. The current menu functionality in the fork (Calculations->10^32 support) doesn't work as of now. So to test this we have to manually change 0/1.

All of this is, however, a bit slower than using the simple number_t data type that was either long double or __float128 (actually, with float and double it works, too, however float was buggy for some reason before GCC-9). I guess, the file formulas.cpp is the biggest effort for the compiler: it takes quite a big amount of time to have it compiled. (However, GCC-10 is significantly faster than the earlier versions.)

I understand operator overloading is a bit costlier and this could be optimized by using inline definitions but the issue remains same even with those changes.

Maybe there is a better solution here or something to be done to make this work efficiently. Even if there is a solution with a completely different idea, it'd be really helpful.

kanurag94 avatar Jul 09 '20 12:07 kanurag94

Ideally the functions that use number_t should be templatized using C++ templates ( https://en.cppreference.com/w/cpp/language/templates ) so that they can be instantiated at compile time to use whatever type we want, then just have the outermost calling function choose the type at runtime. That way there is very little runtime overhead.

On Thu, Jul 9, 2020 at 7:23 AM Anurag Aggarwal [email protected] wrote:

I started working with @kovzol https://github.com/kovzol on this issue in hope to support both long double and __float128 precision without recompiling the code (See this branch for code changes https://github.com/kanurag94/XaoS/tree/float128). The concept is to change the number_t datatype to a struct that has two entries: one for long double and one for __float128. So the compiler can handle all of this in advance. Since the codebase has been changed to cpp, we were able to take advantage of operator overloading to avoid rewrite of many formulas especially inside (formulas.cpp) file. Have a look on the file https://github.com/kanurag94/XaoS/blob/float128/src/include/number_t.h that implements this concept. The file https://github.com/kanurag94/XaoS/blob/float128/src/include/number_t.cpp is able to shift currently used data-type as 0/1 for float128, and long double respectively. The current menu functionality in the fork (Calculations->10^32 support) doesn't work as of now. So to test this we have to manually change 0/1.

All of this is, however, a bit slower than using the simple number_t data type that was either long double or __float128 (actually, with float and double it works, too, however float was buggy for some reason before GCC-9). I guess, the file formulas.cpp is the biggest effort for the compiler: it takes quite a big amount of time to have it compiled. (However, GCC-10 is significantly faster than the earlier versions.)

I understand operator overloading is a bit costlier and this could be optimized by using inline definitions but the issue remains same even with those changes.

Maybe there is a better solution here or something to be done make this work efficiently. Even if there is a solution with a completely different idea, it'd be really helpful.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xaos-project/XaoS/issues/24#issuecomment-656095626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHNZQ3GDFPIZJSXVQ23DLR2WZCZANCNFSM4AJJ443A .

jblang avatar Jul 09 '20 12:07 jblang

Thanks. I tried to implement this, there are many name clashes for now. I'll update after resolving them.

kanurag94 avatar Jul 10 '20 16:07 kanurag94

I tried to template the code for number_t occurrences. This is implemented in the forked branch here https://github.com/kanurag94/XaoS/tree/templatize-xaos This replaces nearly 200 instances of number_t however there are still other 200 instances left to be replaced. These are a little difficult to replace. Templates require to have their definition available at compilation. This can be done by several ways as mentioned here https://isocpp.org/wiki/faq/templates#separate-template-fn-defn-from-decl - this says in order to make them work we have to provide function and struct member functions definitions both in the header files. This makes code ugly and non maintainable.

The other way is to create separate cpp files for them. This still fails in a multiple file project for example src/ui/menu.cpp uses src/ui-hlp/ui_helper.cpp. I think there would be a better and smarter way to implement this but I am stuck here now.

kanurag94 avatar Jul 27 '20 08:07 kanurag94

I would convert the fractal_context struct to a class and make all the functions that operate on that struct members of the class. Then you can instantiate multiple instances of the class with different types for the template parameters. There will be two or more known types at compile time and it will create code for both of them. Then a wrapper function can decide which instance to call at runtime depending on user configuration. I quickly looked at the uses of number_t in ui_helper and I think we could just change most of those to long double. The only place we need extended precision is in the calculation of the fractal itself and it looks like most of the code in ui_helper is just holding configuration or post-processing the image. If there is something where the precision is important, I would try to move that code into fractal_context and call it from ui_helper instead.

On Mon, Jul 27, 2020 at 3:14 AM Anurag Aggarwal [email protected] wrote:

I tried to template the code for number_t occurrences. This is implemented in the forked branch here https://github.com/kanurag94/XaoS/tree/templatize-xaos This replaces nearly 200 instances of number_t however there are still other 200 instances left to be replaced. These are a little difficult to replace. Templates require to have their definition available at compilation. This can be done by several ways as mentioned here https://isocpp.org/wiki/faq/templates#separate-template-fn-defn-from-decl

  • this says in order to make them work we have to provide function and struct member functions definitions both in the header files. This makes code ugly and non maintainable.

The other way is to create separate cpp files for them. This still fails in a multiple file project for example src/ui/menu.cpp uses src/ui-hlp/ui_helper.cpp. I think there would be a better and smarter way to implement this but I am stuck here now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xaos-project/XaoS/issues/24#issuecomment-664192267, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHNZXBHXIJYGCIORXBGOTR5UZPTANCNFSM4AJJ443A .

jblang avatar Jul 27 '20 13:07 jblang

Thanks. I started implementing this. In my initial attempt, I have one doubt regarding the implementation you suggested. uih_context and uih_savedcontext are two struct using fractal_context. Did you mean to create two members inside them as

struct uih_savedcontext {
  fractal_context<type1> *ftype1context;
  fractal_context<type2> *ftype2context;
}

and use a wrapper function to select them during runtime ? In this case for example if we shift from type1 to type2, we will have to copy the data from one to the other. Is this correct?

kanurag94 avatar Aug 05 '20 20:08 kanurag94

Yes, I think so. I am not aware of any way to avoid copying the data.

On Wed, Aug 5, 2020 at 3:19 PM Anurag Aggarwal [email protected] wrote:

Thanks. I started implementing this. In my initial attempt, I have one doubt regarding the implementation you suggested. uih_context and uih_savedcontext are two struct using fractal_context. Did you mean to create two members inside them as

struct uih_savedcontext { fractal_context *ftype1context; fractal_context *ftype2context; }

and use a wrapper function to select them during runtime ? In this case for example if we shift from type1 to type2, we will have to copy the data from one to the other. Is this correct?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xaos-project/XaoS/issues/24#issuecomment-669481170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHNZU3KLSMTFMLN4IILYTR7G5ELANCNFSM4AJJ443A .

jblang avatar Aug 05 '20 20:08 jblang

Thanks! There are many instances (~470) of uih_context fcontext. I did try to figure out all the possible ways we can write a wrapper function to choose which instance to call. But unfortunately there seems to be no easy way than to put if else around all object calls. May be I am missing something here.

kanurag94 avatar Aug 14 '20 10:08 kanurag94