dea-cogger icon indicating copy to clipboard operation
dea-cogger copied to clipboard

Grab bag of outstanding issues

Open omad opened this issue 7 years ago • 6 comments

Work Generation

  • [ ] Incremental. Need a way to compare what's on S3 to what's not.
    • Our current unit of work for COG conversion is a NetCDF file. Either stacked or unstacked. It will be awkward to compare Stacked NetCDF files to data existing on S3 since they represent different one dataset vs many..

COG Converter

  • [x] Configurable COG parameters when generating overviews.
    • [x] Resampling method for overlays for different products
    • [x] Number of overview levels
    • [x] Compression/chunk size (maybe, deflate/512 is good, but...)
  • [x] Is it faster/easier/more configurable to use rio cogeo than raw GDAL.
  • [ ] Review/test the parameters used

Uploader

  • [x] Specify bucket instead of having COG-Conversion define it.
  • [x] Give uploader an option to move files to a COMPLETE directory instead of deleting them. Will let us test upload to a dev bucket, and then run again against the prod bucket.
  • [ ] SPEED How fast can we upload in a single thread, do we need parallel upload processes?
  • [ ] MAYBE Ability to watch multiple directories?

omad avatar Oct 10 '18 05:10 omad

For the COG-conversion

  • [x] Validation of COG datasets - On which side do we do this?

ashoka1234 avatar Oct 10 '18 06:10 ashoka1234

A very quick test of rio cogeo indicates no significant performance difference.

omad avatar Oct 12 '18 05:10 omad

  • [ ] Easier submission to large parallel PBS jobs
  • [ ] Useful progress logging when run in PBS or as a background process
    • We currently use tqdm for an interactive progress bar, it might be possible to use it for background progress logs
  • [ ] Finer grained progress when converting stacked NetCDF files
  • [ ] Progress and speed metrics from the Uploader

omad avatar Oct 12 '18 05:10 omad

For the COG-conversion

  • [ ] Further configurability of upload directory structure, for example some products wants flat directory structure in AWS when only yearly time makes sense, i.e. without months and days
  • [ ] What do we do when time indicated in the file name rather than the timestamp of dataset makes sense

ashoka1234 avatar Oct 12 '18 05:10 ashoka1234

  • [ ] Make the src_template more flexible and simplify class COGProductConfiguration by using either parse or accepting regexes.
  • [ ] Look into using MPIPoolExecutor for distributing work inside PBS jobs

omad avatar Oct 12 '18 05:10 omad

For the COG-conversion

* [x]  Validation of `COG` datasets - On which side do we do this?

On NCI, done!

emmaai avatar Nov 15 '18 01:11 emmaai