RRTMGP.jl
RRTMGP.jl copied to clipboard
Try using BitArray for cloud masks
I'm curious if this works on GPUs
Does this improve performance?
Does this improve performance?
🤷🏻
Current main
:
julia --project=gpuenv test/all_sky_tuning.jl
device = ClimaComms.CUDADevice(); FT = Float64, ncols = 131658; size per field = 0.04119899868965149 GB
"timing longwave solver" = "timing longwave solver"
1.159210 seconds (66 CPU allocations: 14.969 KiB)
1.158549 seconds (65 CPU allocations: 14.891 KiB)
1.158072 seconds (66 CPU allocations: 15.000 KiB)
1.157427 seconds (45 CPU allocations: 13.094 KiB)
1.157513 seconds (45 CPU allocations: 13.094 KiB)
"timing shortwave solver" = "timing shortwave solver"
0.863498 seconds (51 CPU allocations: 13.469 KiB)
0.862782 seconds (51 CPU allocations: 13.469 KiB)
0.863073 seconds (51 CPU allocations: 13.469 KiB)
0.862254 seconds (51 CPU allocations: 13.469 KiB)
0.864140 seconds (51 CPU allocations: 13.469 KiB)
39.751985 seconds (97.94 M allocations: 5.623 GiB, 4.70% gc time, 54.58% compilation time: 1% of which was recompilation)
This branch:
julia --project=gpuenv test/all_sky_tuning.jl
device = ClimaComms.CUDADevice(); FT = Float64, ncols = 131658; size per field = 0.04119899868965149 GB
"timing longwave solver" = "timing longwave solver"
1.160132 seconds (66 CPU allocations: 14.969 KiB)
1.157305 seconds (65 CPU allocations: 14.891 KiB)
1.156251 seconds (66 CPU allocations: 15.000 KiB)
1.156213 seconds (45 CPU allocations: 13.094 KiB)
1.157813 seconds (45 CPU allocations: 13.094 KiB)
"timing shortwave solver" = "timing shortwave solver"
0.863435 seconds (51 CPU allocations: 13.469 KiB)
0.861900 seconds (51 CPU allocations: 13.469 KiB)
0.863441 seconds (51 CPU allocations: 13.469 KiB)
0.861509 seconds (51 CPU allocations: 13.469 KiB)
0.863258 seconds (51 CPU allocations: 13.469 KiB)
39.466558 seconds (97.92 M allocations: 5.624 GiB, 4.58% gc time, 53.51% compilation time: 1% of which was recompilation)