stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Kahan and Klein summation

Open zerothi opened this issue 2 years ago • 15 comments

Motivation

Precision summations are necessary for certain operations (also suggested for cumsum)

Would you accept a type(Kahan(dp)) and type(Klein(dp)), and precision variants for inclusion in the stdlib?

Prior Art

No response

Additional Information

No response

zerothi avatar Feb 21 '23 11:02 zerothi

Instead of having the type in the name, I would suggest we name them:

type(Kahan_real(dp)) :: rsum
type(Kahan_cmplx(dp)) :: csum

to assign data-type?

zerothi avatar Feb 21 '23 12:02 zerothi

Thanks for this proposal. I think it's in scope and needed. I know of Kahan sum but not Klein--can you provide some reference for each here?

Regarding the API, can you explain the need for using derived types here? Could we not implement this as generic functions, i.e function sum_kahan(x)?

milancurcic avatar Feb 22 '23 12:02 milancurcic

  1. The Klein summation is a 2nd order improvement of the Kahan summation. In fact there is a simple extension of the Kahan summation that fixes cases where the input value is larger than the currently summed quantity (e.g. single values are much much larger than the rest) or similar. Perhaps we can start with the regular Kahan, and then decide on how to add more advanced schemes, see for instance https://en.wikipedia.org/wiki/Kahan_summation_algorithm, I just call it Klein to not have excessively long names
  2. The point is that Kahan summation might be needed on non-array summations, consider complex arithmetic on various array combinations a(:) * b(:) * sqrt(c(:) + a2(:,1) * a2(:,2), in this case I see a great value in enabling a user-defined type that sums arrays.
  3. I do plan (and have a code) that adds the functions for regular arrays, as your sum_kahan. But I think both are necessary.

zerothi avatar Feb 22 '23 12:02 zerothi

A pitfall of this algorithm is that compilers may decide that the parentheses are superfluous and thereby destroy the effect. How do you guard against that? The WIkipedia article (which also describes the Klein version - it was a new one for me as well) suggests using a volatile variable. Have you considered other summation tecniques, such as pariwise summation (not as accurate perhaps but more accurate than the naïve algorithm and faster than Kahan)? There is also a rather nice article by Nicholas Higham about summation techniques. It could be an inspiration.

Op wo 22 feb. 2023 om 13:35 schreef Nick Papior @.***>:

  1. The Klein summation is a 2nd order improvement of the Kahan summation. In fact there is a simple extension of the Kahan summation that fixes cases where the input value is larger than the currently summed quantity (e.g. single values are much much larger than the rest) or similar. Perhaps we can start with the regular Kahan, and then decide on how to add more advanced schemes
  2. The point is that Kahan summation might be needed on non-array summations, consider complex arithmetic on various array combinations a(:)
  • b(:) * sqrt(c(:) + a2(:,1) * a2(:,2), in this case I see a great value in enabling a user-defined type that sums arrays.
  1. I do plan (and have a code) that adds the functions for regular arrays, as your sum_kahan. But I think both are necessary.

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/695#issuecomment-1439947051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR3EHMKVIFJ43V6CRV3WYYBYTANCNFSM6AAAAAAVC6LQPY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

arjenmarkus avatar Feb 22 '23 12:02 arjenmarkus

I agree that compilation problems are vital.
However, we could add a test when compiling it to see if optimizations turn them off. And/or change the default flags for GNU/Intel compilers to obey code layout for this code only. Again, since incremental summations might be useful I think the type is necessary. As for the array summation, agreed that could be split into pairwise, etc. algorithms, but that isn't the point here, and those should probably be put in another module (stdlib_reduce for instance which would also accommodate norms etc.) :)

zerothi avatar Feb 22 '23 12:02 zerothi

I do not mean to expand the work :), the main issue is the possible interference of compilers thinking they know better. I was inspired by the Wikipedia article wrt the other possibiliities.

Op wo 22 feb. 2023 om 13:56 schreef Nick Papior @.***>:

I agree that compilation problems are vital. However, we could add a test when compiling it to see if optimizations turn them off. And/or change the default flags for GNU/Intel compilers to obey arithmetic for this code only. Again, since incremental summations might be useful I think the type is necessary. As for the array summation, agreed that could be split into pairwise, etc. algorithms, but that isn't the point here, and those should probably be put in another module ( stdlib_reduce for instance which would also accommodate norms etc.) :)

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/695#issuecomment-1439972787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4NAWEZSFRBROCYTFTWYYEGBANCNFSM6AAAAAAVC6LQPY . You are receiving this because you commented.Message ID: @.***>

arjenmarkus avatar Feb 22 '23 14:02 arjenmarkus

I agree that the compiler problem might be a blocker, but at least having the algorithm in-place and warning users about possible problems might be the only way to have the possibility in stdlib.. hmm...

zerothi avatar Feb 22 '23 14:02 zerothi

Well, Fortran does allow variables to be volatile. That might definitely be a solution.

Op wo 22 feb. 2023 om 15:13 schreef Nick Papior @.***>:

I agree that the compiler problem might be a blocker, but at least having the algorithm in-place and warning users about possible problems might be the only way to have the possibility in stdlib.. hmm...

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/695#issuecomment-1440085009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR33DBNDZ7P76DKLMVTWYYNKNANCNFSM6AAAAAAVC6LQPY . You are receiving this because you commented.Message ID: @.***>

arjenmarkus avatar Feb 22 '23 14:02 arjenmarkus

Ah yes, good point. So bottom line:

  1. Implement kahan summation (original version)
  2. use volatile attributes
  3. use types for custom summation paths
  4. an array summation with kahan for easy solution

Anything else?

zerothi avatar Feb 22 '23 14:02 zerothi

Is this issue similar (or a duplicate) to #402?

ivan-pi avatar Mar 01 '23 16:03 ivan-pi

A pitfall of this algorithm is that compilers may decide that the parentheses are superfluous and thereby destroy the effect. How do you guard against that?

In principle you could parse the contents of compiler_options() intrinsic function and issue a warning to the error_unit, say if -ffast-math was detected with gfortran.

But in general I'd just go with a warning in the documentation.

ivan-pi avatar Mar 01 '23 16:03 ivan-pi

Yeah, it is. So is this something I should do, or?

zerothi avatar Mar 02 '23 07:03 zerothi

If your reply is aimed at the compiler_options() comment, personally I wouldn't do it. Extra complexity for little benefit.

ivan-pi avatar Mar 02 '23 08:03 ivan-pi

No, I am questioning whether I should make a PR, the details of how to fix things is something we could take in the draft (IMHO).

zerothi avatar Mar 02 '23 08:03 zerothi

If of interest for this discussion, I have an implementation of the kahan summation on top of a chunks-based sum here: https://github.com/jalvesz/fast_math/blob/main/src/fast_sum.f90 (if aggressive optimizations are activated, the chunks approach would still give an accuracy that should be at least 1 order of magnitude better than intrinsic)

jalvesz avatar Nov 14 '23 22:11 jalvesz