cf-python
cf-python copied to clipboard
support for updating cf aggregation files
Currently with CFA aggregation, if new fragment files are added to a directory, it is necessary to read all the fragments to (re)build a new aggregation file. Not only is that an expensive operation if there are lot of fragments, it is possible we might want to add local files to an aggregation which has remote fragments.
So: If we add new files to a directory, how do we want to update the aggregation, and avoid running it again?
Use Case 1:
I ran a model which wrote output every month, and every year of simulation, I run an aggregation, and write data to tape. Then the next year of data comes. i want to have one aggregated field variable?
- Hopefully that's relatively straight forward, I run an aggregation on the existing aggregation and the new files (the old files are now on tape), and create a new consolidated aggregation file with no need for the old aggregation file.
Use Case 2
As above, but the data is updating on disk, so I keep getting more data in the directory. We want the new aggregation to only touch the new data and use the old aggregation.
- I think that would be enabled by having "last aggregated" as a time in the aggregation file, and using that to not touch any fragment which precedes that in the aggregation?