Wflow.jl icon indicating copy to clipboard operation
Wflow.jl copied to clipboard

Specify precision (number of digits) for the output files

Open JoostBuitink opened this issue 1 year ago • 1 comments

Feature type

Adding new functionality

Improvement Description

When writing output files, currently many digits are saved as well. This very precise information is often not necessary, and potentially consumes more storage space than required. By limiting the number of digits when writing the data, the files should become smaller and potentially also easier to compress (for the gridded output data).

Implementation Description

A global setting that controls the number of digits for all output files. This global setting can potentially be overwritten for each specific output entry (in e.g. each [[csv.column]] and [[netcdf.variable]] section) if a different precision value is required for specific output values. I would say that latter is more nice-to-have, and the global setting would probably be sufficient for the majority of users.

Additional Context

No response

JoostBuitink avatar Dec 07 '23 10:12 JoostBuitink

(Mostly a drive-by comment, saw this pop up in my notifications...)

This makes some sense for CSV: every digit is a byte, but the gains for netCDF seem pretty marginal. It's actually nice having full 64-bit output: you can compute things afterwards and check your model.

Digits also aren't meaningful for binary numbers: e.g. 0.1 cannot be represented exactly in binary. You might think of a reduced number of bytes (e.g. 32-bit instead of 64-bit). But note that there's generally no hardware support in our CPUs for float16 or something, so it'll be slow. NetCDF has better options: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html Zarr and Blosc are probably nicer.

For CSV writing, you'd want to use a formatting option, not explicitly rounding (because of numbers like 0.1). You'll also have to choose a format (scientific, fixed, "general"). I'd recommend making it simple and only enabling scientific. See also the Python docs section for float and decimal types: https://docs.python.org/3/library/string.html#format-specification-mini-language

Anyway, I'd strongly recommend treating binary output very separately from text output.

Huite avatar Dec 07 '23 16:12 Huite