dea-cogger
dea-cogger copied to clipboard
Grab bag of outstanding issues
Work Generation
- [ ] Incremental. Need a way to compare what's on S3 to what's not.
- Our current unit of work for COG conversion is a NetCDF file. Either stacked or unstacked. It will be awkward to compare Stacked NetCDF files to data existing on S3 since they represent different one dataset vs many..
COG Converter
- [x] Configurable COG parameters when generating overviews.
- [x] Resampling method for overlays for different products
- [x] Number of overview levels
- [x] Compression/chunk size (maybe, deflate/512 is good, but...)
- [x] Is it faster/easier/more configurable to use rio cogeo than raw GDAL.
- [ ] Review/test the parameters used
Uploader
- [x] Specify
bucketinstead of havingCOG-Conversiondefine it. - [x] Give uploader an option to move files to a
COMPLETEdirectory instead of deleting them. Will let us test upload to adevbucket, and then run again against theprodbucket. - [ ] SPEED How fast can we upload in a single thread, do we need parallel upload processes?
- [ ] MAYBE Ability to watch multiple directories?
For the COG-conversion
- [x] Validation of
COGdatasets - On which side do we do this?
A very quick test of rio cogeo indicates no significant performance difference.
- [ ] Easier submission to large parallel PBS jobs
- [ ] Useful progress logging when run in PBS or as a background process
- We currently use
tqdmfor an interactive progress bar, it might be possible to use it for background progress logs
- We currently use
- [ ] Finer grained progress when converting stacked NetCDF files
- [ ] Progress and speed metrics from the Uploader
For the COG-conversion
- [ ] Further configurability of upload directory structure, for example some products wants flat directory structure in AWS when only yearly time makes sense, i.e. without
monthsanddays - [ ] What do we do when time indicated in the file name rather than the
timestamp of datasetmakes sense
- [ ] Make the
src_templatemore flexible and simplifyclass COGProductConfigurationby using eitherparseor accepting regexes. - [ ] Look into using MPIPoolExecutor for distributing work inside PBS jobs
For the
COG-conversion* [x] Validation of `COG` datasets - On which side do we do this?
On NCI, done!