Peter Ukkonen comments

Results 15 comments of


                                            Peter Ukkonen

RRTMGP and Single Precision

Not at all, it was quite hidden in the pull request topics. Would be great if more people could test the fix! It sounds like Robert will put it in...

Make minimum value k depend on floating point precision

I think it's a good idea, at least as a temporary solution, so people can run the code in single precision. I tested the fix also with ecRAD and it...

Make minimum value k depend on floating point precision

Robin Hogan and I spent some time looking at this issue with inaccuracies in shortwave reflectance-transmittance computations in single precision. I don't remember all the ins and outs of it...

implement Fortran 2003 interface module for BLIS C API

Hello! I would like to use the mixed precision procedure "bli_gemm" in my Fortran code. Is this not possible yet?

`gfortran -ffree-line-length-none -std=f2008 -pg -march=native -O3 --fast-math ` Elapsed time on output: 0.939000010 Elapsed time on output_opt_sig: 0.680999994 Elapsed time on output_flatmodel_opt_sig: 0.669000030 `gfortran -ffree-line-length-none -m64 -std=f2008 -march=native -O3 --fast-math...

suggestions for performance

On ifort the differences are much bigger: ` ifort -O3 -mkl` Elapsed time on output: 0.8096000 Elapsed time on output_opt_sig: 0.1968000 Elapsed time on output_flatmodel_opt_sig: 0.1787000 Elapsed time on output_sgemm_sig:...

suggestions for performance

`output_opt_sig` and `output_sgemm_sig` do not assume equal-sized hidden layers (I tested that they work). As you can see, the former is almost as fast as the flatmodel one. Processing inputs...

suggestions for performance

> The only opportunity I see is to allocate the activation arrays once rather than re-allocate on assignment in every iteration, in which case the subroutine to calculate activations in-place...

suggestions for performance

You're welcome! I have now implemented these changes in a more general manner here: https://github.com/peterukk/rte-rrtmgp/blob/master/neural/ Most notably, activation functions have been replaced with subroutines, and these procedures have a 2D...

suggestions for performance

FYI I now tested using elemental subroutines as activation functions. This is considerably faster (and work with both 1D and 2D arrays), but unfortunately, pointers do not work with elemental...