gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Adding parameters causes pyramid efficiency to decrease

Open xiaoda0801 opened this issue 1 year ago • 8 comments

HI. When creating a pyramid, adding ALL_CPUS and GDAL_TIFF_OVR_BLOCKSIZE may cause a 20% to 30% reduction in mechanical disk performance.

GDAL: 3.7.2

xiaoda0801 avatar Jan 10 '24 02:01 xiaoda0801

I can confirm that adding ALL CPUS using multithreading will reduce efficiency.

3d934e2b83672447ad68f700978ff970

xiaoda0801 avatar Jan 10 '24 03:01 xiaoda0801

It may well be so. There is no guarantee that using more cpus is always beneficial. Your approach of making tests in your own environment and selecting the settings that suit you is fine.

jratike80 avatar Jan 10 '24 07:01 jratike80

I don't think so. I've tested it on several different computers and all have the same result.

xiaoda0801 avatar Jan 10 '24 07:01 xiaoda0801

I get repeatable results with a PowerShell test on Windows. For me all_cpus is faster. Not much, but the result is stable. Naturally I have deleted the external .ovr file before each run.

PS C:\data\orto> Measure-Command {gdaladdo -ro delete.tif}
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 157

PS C:\data\orto> Measure-Command {gdaladdo -ro delete.tif --config GDAL_NUM_THREADS ALL_CPUS}
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 137

jratike80 avatar Jan 10 '24 07:01 jratike80

  • How many CPUs/virtual CPUs does your system has ?
  • Please include the output of gdalinfo on the geotiff file

rouault avatar Jan 10 '24 10:01 rouault

ok. My system number of cores is 6, number of logical processors is 12,RAM is16GB. gdalinfo: `

Driver: GTiff/GeoTIFF

Files: E:/data/CB04A/fusion/CB04A_WPM_E123.2_N41.7_20230616_L1A0000473266-PAN-Ortho-fus.tiff

   E:/data/CB04A/fusion/CB04A_WPM_E123.2_N41.7_20230616_L1A0000473266-PAN-Ortho-fus.tiff.aux.xml

Size is 65606, 49443

Coordinate System is:

GEOGCRS["WGS 84",

ENSEMBLE["World Geodetic System 1984 ensemble",

    MEMBER["World Geodetic System 1984 (Transit)"],

    MEMBER["World Geodetic System 1984 (G730)"],

    MEMBER["World Geodetic System 1984 (G873)"],

    MEMBER["World Geodetic System 1984 (G1150)"],

    MEMBER["World Geodetic System 1984 (G1674)"],

    MEMBER["World Geodetic System 1984 (G1762)"],

    MEMBER["World Geodetic System 1984 (G2139)"],

    ELLIPSOID["WGS 84",6378137,298.257223563,

        LENGTHUNIT["metre",1]],

    ENSEMBLEACCURACY[2.0]],

PRIMEM["Greenwich",0,

    ANGLEUNIT["degree",0.0174532925199433]],

CS[ellipsoidal,2],

    AXIS["geodetic latitude (Lat)",north,

        ORDER[1],

        ANGLEUNIT["degree",0.0174532925199433]],

    AXIS["geodetic longitude (Lon)",east,

        ORDER[2],

        ANGLEUNIT["degree",0.0174532925199433]],

USAGE[

    SCOPE["Horizontal component of 3D system."],

    AREA["World."],

    BBOX[-90,-180,90,180]],

ID["EPSG",4326]]

Data axis to CRS axis mapping: 2,1

Origin = (122.535508747779431,42.219513022077628)

Pixel Size = (0.000020860301472,-0.000020860301472)

Metadata:

AREA_OR_POINT=Area

Image Structure Metadata:

INTERLEAVE=PIXEL

Corner Coordinates:

Upper Left ( 122.5355087, 42.2195130) (122d32' 7.83"E, 42d13'10.25"N)

Lower Left ( 122.5355087, 41.1881171) (122d32' 7.83"E, 41d11'17.22"N)

Upper Right ( 123.9040697, 42.2195130) (123d54'14.65"E, 42d13'10.25"N)

Lower Right ( 123.9040697, 41.1881171) (123d54'14.65"E, 41d11'17.22"N)

Center ( 123.2197892, 41.7038151) (123d13'11.24"E, 41d42'13.73"N)

Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray

Min=197.000 Max=831.000

Minimum=197.000, Maximum=831.000, Mean=324.914, StdDev=40.716

NoData Value=0

Metadata:

STATISTICS_APPROXIMATE=YES

STATISTICS_MAXIMUM=831

STATISTICS_MEAN=324.91394936491

STATISTICS_MINIMUM=197

STATISTICS_STDDEV=40.71567095048

STATISTICS_VALID_PERCENT=68.91

Band 2 Block=256x256 Type=UInt16, ColorInterp=Undefined

Min=151.000 Max=938.000

Minimum=151.000, Maximum=938.000, Mean=323.158, StdDev=54.129

NoData Value=0

Metadata:

STATISTICS_APPROXIMATE=YES

STATISTICS_MAXIMUM=938

STATISTICS_MEAN=323.15773895484

STATISTICS_MINIMUM=151

STATISTICS_STDDEV=54.129489607214

STATISTICS_VALID_PERCENT=68.91

Band 3 Block=256x256 Type=UInt16, ColorInterp=Undefined

Min=47.000 Max=1187.000

Minimum=47.000, Maximum=1187.000, Mean=321.103, StdDev=90.495

NoData Value=0

Metadata:

STATISTICS_APPROXIMATE=YES

STATISTICS_MAXIMUM=1187

STATISTICS_MEAN=321.10253848467

STATISTICS_MINIMUM=47

STATISTICS_STDDEV=90.495081418363

STATISTICS_VALID_PERCENT=68.91

Band 4 Block=256x256 Type=UInt16, ColorInterp=Undefined

NoData Value=0

`

xiaoda0801 avatar Jan 12 '24 03:01 xiaoda0801

ok, so you are definitely in one use case where multithreading is going to be the less efficient, because you use nearest neighbouring which isn't computing intensive at all, and because your dataset is uncompressed. On my Intel i7-10750H laptop, 6 cores hyper-threaded, hence 12 virtual CPUs, on Linux, I do however get close to a x2 speed-up by using multithreading (wall clock, but if you look at the 'user' time a lot more is burnt):

$ gdal_create in.tif -outsize 20000 20000 -bands 4 -ot uint16 -co tiled=yes
$ time gdaladdo -ro in.tif --config GDAL_NUM_THREADS ALL_CPUS
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m5,887s
user	0m35,110s
sys	0m11,790s
$ rm in.tif.ovr 
$ time gdaladdo -ro in.tif 
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m9,890s
user	0m7,505s
sys	0m2,373s

Perhaps multithreading on Windows is less efficient. In any case, the multittreading here will only be on the computational part of overviews. This shouldn't affect I/O patterns at all, hence I can't make any sense of a "may cause a 20% to 30% reduction in mechanical disk performance." statement.

rouault avatar Jan 12 '24 17:01 rouault

Would be funny if it does, actually if is uncompressed is pretty bad for a mechanical drive.

HDDs are slower than a SSD one, which implies uncompressed data, will takes more time of read than a SSD one. Multithread also means, both threads can request data from the HDD.

HDD will read the data, but if two threads requested at the same time, what will happens is that the disk will try to read data from two places, so something will happens, the disk will try to read the full data in the min time, so one thread will finish, the other one later, where is the point? when the first thread read the data is more likely will also have partially read data from the second thread, so in theory, the read time for the first thread will be higher than not use multiprocessing, the read time of the second one will be smaller, but thinking there is more reading and writing that this process, the multiprocessing is more likely to increase the read time than slower it. (maybe you should include, at the same time, each process writing to the disk....)

Request non-continuous data, and long data to a HDD is a pretty bad idea, and a expensive one, try to avoid it. Compress the data as recommended above is a great idea, and better for HDDs, COG is a great set of TIFF options to optimize, you can try it.

Windows, also usually means NTFS, which means fragmentation, if the data is fragmented imagine what will need to do the disk + all I wrote above.

Also, as ppl already said here, multiprocessing has a cpu cost, in simple terms, if you process at speed v, n samples of data, with t threads, where the speed to create a thread is s, as a very simple way to see it the time to finish is v*n/t+t*s, so is not likely increase thread will do all faster, there is a point where the cost can exceed the 1 thread time, or as usual, tends to stabilize in a specific time, without extra performance increasing number of threads.

Ppl already said it, but is not like multiprocessing will always be beneficial, more process means a lot of costs, share or move data, the thread creating, requested data to devices, write data! found your sweet point :)

latot avatar Jan 24 '24 13:01 latot