geopyspark
geopyspark copied to clipboard
Add COG Reading and Writing Support
Cloud Optimized GeoTiffs (COGs) are GeoTiffs formatted in such a way that they are optimized to be stored and accessed on the Cloud. Having the ability to read and write TiledRasterLayer
s as COGs would be very beneficial. The logic already exists in GeoTrellis, so getting this functionality into GPS should just be the process of creating wrappers around the given Readers and Writers.
Not a solution, but a stop-gap:
def write_cog(layout, location, data, filename_template):
"""
Write a GPS tile out as a Cloud-Optimized GeoTIFF
Uses GDAL to write a COG out to disk, utilizing the given layout and location
to generate the GeoTIFF header.
Args:
layout (``gps.LayoutDefinition``): The layout for the source layer
location (``gps.SpatialKey`` or ``gps.SpaceTimeKey``): The cell identifier
in the layer corresponding to the image data
data (``gps.Tile``): The image data
filename_template (str): A pattern giving the destination for the target
image. Contains two '{}' tokens which will be ``.format``ed with the
column and row of the ``location``, in that order. May be an S3 uri or
local path.
"""
bands, w, h = data.cells.shape
nodata = data.no_data_value
dtype = data.cells.dtype
cw, ch = cell_size(layout)
ex = extent_for_cell(layout, location)
overview_level = int(log(w) / log(2) - 8)
with rstr.io.MemoryFile() as memfile:
with memfile.open(driver='GTiff',
count=bands,
width=w,
height=h,
transform=Affine(cw, 0.0, ex.xmin,
0.0, -ch, ex.ymax),
crs=rstr.crs.CRS.from_epsg(4326),
nodata=nodata,
dtype=dtype,
compress='lzw',
tiled=True) as mem:
windows = list(mem.block_windows(1))
for _, w in windows:
segment = data.cells[:, w.row_off:(w.row_off + w.height), w.col_off:(w.col_off + w.width)]
mem.write(segment, window=w)
mask_value = np.all(segment != nodata, axis=0).astype(np.uint8) * 255
mem.write_mask(mask_value, window=w)
overviews = [2 ** j for j in range(1, overview_level + 1)]
mem.build_overviews(overviews, rwarp.Resampling.nearest)
mem.update_tags(ns='rio_oveview', resampling=rwarp.Resampling.nearest.value)
uri = urlparse.urlparse(filename_template)
if uri.scheme == 's3':
boto3.resource('s3').Bucket(uri.netloc).upload_fileobj(memfile, uri.path.format(location.col, location.row)[1:])
else:
rshutil.copy(mem, filename_template.format(location.col, location.row), copy_src_overviews=True)
Use tile_to_layout()
to build a coarse tiling (say, 2048 or 4096), then use to_numpy_rdd()
and then foreach
to write the individual COGs.