gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Better doc for copy operations and performance implications

Open chacha21 opened this issue 4 years ago • 3 comments

GDAL 3.3.2

Currently, copying one dataset to another is not very well documented regarding the following questions :

-the GDALDriver::Copy() does not copy metadata by default and there seem to be no option to do that automatically (I observed that with ENVI metadata describing the bands, for instance)

-There seems to be no raw copy from a GDALDatasetH to another GDALDatasetH. It seems legit if we really want to ensure that the dst dataset is compatible, but in my use case, it happens very often that I want to reuse datasets that were used for various calcualtion intermediates, that would exactly fit.

-If I want to raw-copy myself between two GDALDatasetH, it is very easy with GDALRasterIO and an intermediate memory buffer, but in that case a performance issue is raised : according to internal data layout, it can be more efficient to copy bands (for each band, copy all (x;y)) or to copy rows (for each y, copy (x,band)). Typically this is what BIL, BIP and BSQ data layout of ENVI will address. But there seem to be no GDAL-standard description of internal interleaving, I must rely on ENVI Interleave metadata by myself.

chacha21 avatar Oct 23 '21 10:10 chacha21

the GDALDriver::Copy() does not copy metadata by default and there seem to be no option to do that automatically (I observed that with ENVI metadata describing the bands, for instance)

There's no Copy() method: did you mean CreateCopy() ? This method is typically implemented by output drivers, which should take the responsibility of copying source metadata when appropriate. This is not a general GDAL mechanism ... generally (for drivers that implement only the Create() method and not CreateCopy(), then the DefaultCreateCopy() method is used, which will copy metadata) . So drivers might behave differently regarding preservation of source metadata. It also depends on the capabilities of the driver/formats (some formats might not accept arbitrary metadata, hence this will go to the .aux.xml side car file). What are your source and target drivers, dataset or band level metadata, which metadata items... ?

-There seems to be no raw copy from a GDALDatasetH to another GDALDatasetH.

I guess what you're looking for is GDALDatasetCopyWholeRaster() : https://gdal.org/api/raster_c_api.html?highlight=gdaldatasetcopywholeraster#_CPPv426GDALDatasetCopyWholeRaster12GDALDatasetH12GDALDatasetH12CSLConstList16GDALProgressFuncPv

Typically this is what BIL, BIP and BSQ data layout of ENVI will address. But there seem to be no GDAL-standard description of internal interleaving

the IMAGE_STRUCTURE metadata domain has a INTERLEAVE=PIXEL/LINE/BAND metadata item for respectively BIP/BIL/BSQ : https://gdal.org/user/raster_data_model.html#image-structure-domain . The ENVI driver publishes it, and is probably one of the few to publish LINE (most other GDAL drviers and alg only know PIXEL and BAND, so LINE might be considered as PIXEL or BAND depending on how tests on the value of INTERLEAVE are written)

Your contributions to improve the docs are also welcome

rouault avatar Oct 23 '21 11:10 rouault

the GDALDriver::Copy() does not copy metadata by default and there seem to be no option to do that automatically (I observed that with ENVI metadata describing the bands, for instance)

There's no Copy() method: did you mean CreateCopy() ? This method is typically implemented by output drivers, which should take the responsibility of copying source metadata when appropriate. [...]. So drivers might behave differently [...] What are your source and target drivers, dataset or band level metadata, which metadata items... ?

Interesting. I will double-check but I observed the non-copy of a "wavelength" metadataitem from ENVI metadata domain.

-There seems to be no raw copy from a GDALDatasetH to another GDALDatasetH. I guess what you're looking for is GDALDatasetCopyWholeRaster() : https://gdal.org/api/raster_c_api.html?highlight=gdaldatasetcopywholeraster#_CPPv426GDALDatasetCopyWholeRaster12GDALDatasetH12GDALDatasetH12CSLConstList16GDALProgressFuncPv

Oops, I did not find this one; there is no C++ equivalent, right ?

Typically this is what BIL, BIP and BSQ data layout of ENVI will address. But there seem to be no GDAL-standard description of internal interleaving the IMAGE_STRUCTURE metadata domain has a INTERLEAVE=PIXEL/LINE/BAND metadata item for respectively BIP/BIL/BSQ : https://gdal.org/user/raster_data_model.html#image-structure-domain . The ENVI driver publishes it, and is probably one of the few to publish LINE (most other GDAL drviers and alg only know PIXEL and BAND, so LINE might be considered as PIXEL or BAND depending on how tests on the value of INTERLEAVE are written)

Oh, I thought that INTERLEAVE was ENVI-specific. If it is not the case, this is less of an issue. My request could be modified by asking for a handy util functions that would report the best GSpacing values for performance before a RasterIO. This is currently rather cumbersome.

I might pull updates only when I am sure that I did not miss anything ! Your answers show that I was clearly not enlighted enough.

chacha21 avatar Oct 23 '21 13:10 chacha21

I observed the non-copy of a "wavelength" metadataitem from ENVI metadata domain.

as far as I can see, the ENVI driver only reports it on the read side, but doesn't write it

there is no C++ equivalent, right ?

no

rouault avatar Oct 23 '21 13:10 rouault