[WIP] GDAL_GCP C++'ification: appropriate for 3.9 ?
Up to now, GDAL_GCP was only a C-only structure. The typical/recommended pattern to create GCPs from C is:
GDAL_GCP* gcps = (GDAL_GCP*)CPLMalloc( count * sizeof(GDAL_GCP) );
GDALInitGCPs( count, gcps ); // zero-initialize pixel,line,x,y,z double members of each gcps[], and affects a CPLStrdup("") string to pszId and pszInfo members of each gcps[]
CPLFree(gcps[0].pszId); // free the initial empty string
gcps[0].pszId = CPLStrdup("foo"); // for example
GDALDeinitGCPs( count, gcps ); // free memory of gcps[i].pszId and gcps[i].pszInfo
CPLFree(gcps); // free the arra itself
This is "fine" for a C user, but quite tedious in a C++ context. Big users of creating GCPs are GDAL drivers themselves.
So this pull request makes GDAL_GCP is a C++ struct, when gdal.h included from C++ code, with a default allocator, a destructor and move constructor / assignment operators.
This mostly works except a few details:
-
If a GDAL_GCP is allocated as a C++ object ( ie something like
GDAL_GCP gcp, orGDAL_GCP gcps[5]or withstd::vector<GDAL_GCP>, or anything that is notCPLMalloc()/VSIMalloc()), and then passed toGDALInitGCPs(), a memory leak would occur becausepszIdwill be initialized twice. Here the fix consists in the GDAL_GCP constructor to assign it to the return of aVSIGetStaticEmptyString()function, which returns a special static instance of an empty string, thatVSIFree()explicitly recognizes and does not free. Hence no leak occurs. -
For a GDAL_GCP created as a C++ object, one could run into issues with double-free, when explicitly calling
GDALDeinitGCPs(), and then when the GDAL_GCP destructor automatically runs. This was easily fixed by makingGDALDeinitGCPs()to nullifypszIdandpszInfomembers after CPLFree()'ing them.
All above are just implementation details. More annoyingly, here are the few backward incompatibilities I found with a few instances of C++ code:
-
In gdalpamdataset.cpp, we kind of abused the recommended workflow mentionned at the beginning, by assigning a static empty string to pszId and pszInfo on a GDAL_GCP[] object, and not use GDALInitGCPs() / GDALDeinitGCPs(). The issue now is that when
~GDAL_GCP()is called it will try toCPLFree()that. No other solution here than removing that initialization, that is no longer needed -
In
GDALLoadTabFile()andGDALLoadOziMapFile(), a GDAL_GCP[] array was allocated, and then memcpy()'d onto a dynamically array allocated withCPLMalloc( count * sizeof(GDAL_GCP) )returned to the user. This code has to be changed to remove thememcpy()part and docpl_malloc_gcps[i] = std::move(asGCPs[i]) -
The Sentinel2 driver used
memset(gcps, 0, sizeof(gcps))to initialize a GDAL_GCP[] array. Some compilers complained about memset()'ing a non plain-old-object instance. The fix is just to remove that now useless memset().
Except those details, all other drivers compile and pass their regression tests, which makes me believe, that while being backward incompatible, this is still reasonable. The compiler should complain in most problematic situations where the code has to be changed (this was for example for the memcpy() in GDALLoadTabile/GDALLoadOziMapFile). The only undetected change at compile time was the "clever trick" of gdalpamdataset.cpp which was only detected at runtime. Hopefully such a pattern should be exceptional.
In the worse case, users can define the new GDAL_SUPPRESS_GCP_CPLUSPLUS macro before including gdal.h, which will make their code see the POD C structure.
And to re-iterate, this has no impact on C-only users
The first commit introduces the minimum changes to get things working, and the second one modernizes the HFA driver to use a std::vector<GDAL_GCP> to show the benefits of the proposed change (less magic numbers, automatic construction/destruction) . More drivers could be modernized in a similar way
This <const_cast>(special empty string) dance doesn't look healthy...
I assume VSIFree is used much more often than GDAL_GCP. Thus the extra test is quite expensive. Would it help to make the special empty string a non-static private member of GDAL_GCP instead of a global static? A little bit more memory for very few cases instead of a little bit more runtime for many cases.
Could pszId and pszInfo be changed to const char*? This would fit well to the signatures of SetId() and SetInfo().
Thus the extra test is quite expensive
I haven't measured the perf impact, but the only significant impact I would anticipate would only be in loops where you would do tons of malloc/free patterns, and that's obviously something that should be avoid be avoided upfront. I'd be very surprised that this shows up in realistic usage scenarios.
Would it help to make the special empty string a non-static private member of
GDAL_GCPinstead of a global static?
This special static empty string that VSIFree() doesn't free is necessary if we want backward compatibility for scenarios like:
GDAL_GCP x; GDALInitGCPs(1, &x);. Without the special empty string trick, the GDAL_GCP constructor would have to dynamically allocate one empty string, and GDALInitGCPs() would override it with another one, without freeing (since GDALInitGCPs() is aimed at being used on uninitialized data in its nominal usage), thus causing a memory leak.
We could fix all GDAL code to avoid doing GDALInitGCPs() on GDAL_GCP objects C++ constructed, but I've no idea of the impact on outside code.
Could pszId and pszInfo be changed to const char*?
Definitely not for the C version of the structure. And for the C++ one, that would break all code that currently sets them directly (also hard to quantify how frequent this is outside of the GDAL codebase)
I see. I still feel a little bit worried by the mix of C, C++, and const_cast.
Closing, as superseded per https://github.com/OSGeo/gdal/pull/9285