Oceananigans.jl
Oceananigans.jl copied to clipboard
Float32 output as default for jld2 and netcdf
In the output writers, this suggests to replace array_type = Array{Float64} with type = Float32 to
- make Float32 the default
- output always
Array
1) is so that we save memory in big simulations, and I have the plan to add bitrounding too so that files get like 10x smaller or so. This would be by default in a very conservative way so that we don't accidentally throw away information. Once I've understood where to best do the bitrounding I'll add this to this PR and rename accordingly
2) is because I cannot think of a situation where you would want to output something different than Array? But please correct me if I'm wrong. I haven't found another example in this repository.
2)is because I cannot think of a situation where you would want to output something different than Array? But please correct me if I'm wrong. I haven't found another example in this repository.
Does this matter if you're outputting an average in all 3 dimensions? Because that's important, but I'm not if changing the type would affect this situation.
2)is because I cannot think of a situation where you would want to output something different than Array? But please correct me if I'm wrong. I haven't found another example in this repository.
We chose array_type to permit the flexibility of other array types. I don't know enough to say that we would never want another array type. Better to be defensive than aggressively constraining user action?
Float32 used to be the default. However, this produced a lot of pain in some testing situations where we wanted to show bitwise reproducability / accuracy in saving. I can't remember all the details, but after a few user issues (in addition to our own pain), we decided to switch to Float64. I agree that Float32 is better, but could be regarded as "premature optimization" in this case. Definitely open to discuss though.
I would need to know the details, but technically bitwise reproducibility is easier with Float32 if you compute in Float64 because you throw away 29 bits that could be different. In practice, however, I don't see a difference between Float32/64 regarding this or accuracy. Both are way too precise for what we're doing anyway?
I would need to know the details, but technically bitwise reproducibility is easier with Float32 if you compute in Float64 because you throw away 29 bits that could be different. In practice, however, I don't see a difference between Float32/64 regarding this or accuracy. Both are way too precise for what we're doing anyway?
It's not actually a question of easy vs hard, its a question of whether you need output in Float64 for a == b to return true. If you forget that the default is Float32 then a won't exactly equal b for this sort of test. But you might think there's a bug in the output writers or your own code before you realize that it's just because you need Float64 for this particular kind of check.
I think one example was computing an average on the fly, versus computing it in post-processing by averaging the output?
Might've been this one:
https://github.com/CliMA/Oceananigans.jl/blob/fa5e280115f619d01a460f012328bd7e6d253b38/test/test_netcdf_output_writer.jl#L552
But there were also some user issues which lead us to believe it wasn't just about making sure the tests were good, as a recall.
We can still take an opinionated stance that this is an important enough issue that it's worth some temporary user confusion.
I think we should do this...
Thanks for the ping. I want to tackle this week after I submitted the JOSS paper!