django-raster icon indicating copy to clipboard operation
django-raster copied to clipboard

multi task processing result in multiple times creation same tmp file & multiple times processing same raster

Open justRishi opened this issue 3 years ago • 0 comments

Problem

If RASTER_USE_CELERY = True and (RASTER_PARSE_SINGLE_TASK = False or not set) then a temp file is created multiple times in def open_raster_file in RasterLayerParser in parser.py Also when not in the right reprojection, the projection is done multiple times.

Why problem

Big raster files are copied in my case 4 times, processed by GDAL 4 times . and sometimes (when not in the right projection) 4 times reprojected.

How tested

by adding self.log to print out tmp file creation resulting in: image

How to mitigate

put RASTER_PARSE_SINGLE_TASK = True in settings , but meaning will not use concurrency to process raster file

Possible solution to process parallel and not duplicate work

  1. check that only 1 tmp folder is created : so this line in parser.py should change self.tmpdir = tempfile.mkdtemp(dir=raster_workdir (as always unique)
  2. self.dataset in parser.py (in class RasterLayerParser) should be shared by all parallel tasks for same raster file

justRishi avatar Jan 18 '22 14:01 justRishi