Likely redundant information in state files
At some point it would probably be good to attempt to reduce the number of variables stored in the state files. There are a large number currently, particularly lake state variables, and I think several of them could contain redundant information or could be derived from the others (i.e. not independent of each other). Eliminating some of these would save space and make code maintenance easier.
Some good candidates to investigate include:
- Lake snow-related variables - the lake has its own snow data structure to store information about the snowpack on top of lake ice. But the lake_var structure also contains some snow-related information. Some examples are: surf_temp, surf_water, pack_temp, and pack_water, coldcontent, SAlbedo, sdepth. Since the lake's snow_data structure contains a complete description of the snowpack state, it seems that the lake_var versions of these variables could be eliminated and just re-computed after reading the state file.
- Lake vertical average temperature - could be derived from individual lake layer temperatures.
- Lake surface, sarea, and volume - could be derived from lake depth and lake parameters.
- Soil temperature profile information: In the classic driver, dz_node and node_depth arrays are stored once per grid cell, despite their being constant across the entire domain.
- Nveg, nbands - in the classic driver, these are stored but are redundant with their counterparts in the veg_param and snow_bands files. They were (presumably) included so that the ascii state file could be parsed by a script without having to read in the veg_param and snow_bands files. But not only could space be saved by omitting them, their presence opens up the possibility of discrepancies with the veg_param and snow_bands versions of these. Although one could argue that keeping them would allow VIC to throw up an error if they differ from the veg_param and snow_bands versions.
Other issues:
- Highly heterogeneous structure: a. The line containing bare soil tile information does not have a Wdew field since there is no vegetation. However, it is not clear whether the bare soil tile is, in fact, bare just from looking at it; information from the veg_param file is necessary to determine this (and then to decide whether to expect an entry for Wdew). b. The lake line begins with a space, followed by soil moisture; non-lake lines begin with the veg tile index and the snow band index, followed by soil moisture. c. The lake variables that are arrays with one element per lake layer (e.g., lake_temp, lake_surface), have lengths that vary in time and space. This is because there is a minimum allowable lake layer thickness, and as lake depth varies, lake layers vary proportionately. So, when lake depth is less than Nlayers * dz_min, layers are merged. The state file only stores the values of lake variables for these active layers, thus the number of fields devoted to these arrays varies in time and space.
- Unnecessary information: a. The variable lake_surface actually contains one extra element corresponding to the area of the bottom of the lowest layer. This is always 0, by definition. Thus, we can think of it as either an extra 0 following the lake_surface array and skip it when reading, or we can read it in, keeping in mind that the lake_surface array has an additional element. This will actually be solved by addressing point 3 of redundant information, above. But it's good to be aware of until point 3 is fixed.
Note that I'm not suggesting that this is a high priority for VIC 5.0. I'm just documenting it.
We should be careful to sort this out at some point. We definitely do not want redundant inf. Not just for file size, but because we should not carry copies of the state internally either. This will make it too easy for bugs to creep in. For example, someone would update one and forget that there is another instance.
Regarding the redundant lake state variables: it may be a good idea to remove some or all of those from VIC altogether, not just from the state file. Particularly the variables pertaining to the snowpack on top of lake ice - there are versions in the lake_var.snow structure, which I would recommend keeping, and in the lake_var structure itself, which I would recommend removing (replacing throughout the lake code with the lake_var.snow versions).