Adding parameters causes pyramid efficiency to decrease
HI. When creating a pyramid, adding ALL_CPUS and GDAL_TIFF_OVR_BLOCKSIZE may cause a 20% to 30% reduction in mechanical disk performance.
GDAL: 3.7.2
I can confirm that adding ALL CPUS using multithreading will reduce efficiency.
It may well be so. There is no guarantee that using more cpus is always beneficial. Your approach of making tests in your own environment and selecting the settings that suit you is fine.
I don't think so. I've tested it on several different computers and all have the same result.
I get repeatable results with a PowerShell test on Windows. For me all_cpus is faster. Not much, but the result is stable. Naturally I have deleted the external .ovr file before each run.
PS C:\data\orto> Measure-Command {gdaladdo -ro delete.tif}
Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 157
PS C:\data\orto> Measure-Command {gdaladdo -ro delete.tif --config GDAL_NUM_THREADS ALL_CPUS}
Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 137
- How many CPUs/virtual CPUs does your system has ?
- Please include the output of gdalinfo on the geotiff file
ok. My system number of cores is 6, number of logical processors is 12,RAM is16GB. gdalinfo: `
Driver: GTiff/GeoTIFF`Files: E:/data/CB04A/fusion/CB04A_WPM_E123.2_N41.7_20230616_L1A0000473266-PAN-Ortho-fus.tiff
E:/data/CB04A/fusion/CB04A_WPM_E123.2_N41.7_20230616_L1A0000473266-PAN-Ortho-fus.tiff.aux.xmlSize is 65606, 49443
Coordinate System is:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble", MEMBER["World Geodetic System 1984 (Transit)"], MEMBER["World Geodetic System 1984 (G730)"], MEMBER["World Geodetic System 1984 (G873)"], MEMBER["World Geodetic System 1984 (G1150)"], MEMBER["World Geodetic System 1984 (G1674)"], MEMBER["World Geodetic System 1984 (G1762)"], MEMBER["World Geodetic System 1984 (G2139)"], ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]], ENSEMBLEACCURACY[2.0]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], CS[ellipsoidal,2], AXIS["geodetic latitude (Lat)",north, ORDER[1], ANGLEUNIT["degree",0.0174532925199433]], AXIS["geodetic longitude (Lon)",east, ORDER[2], ANGLEUNIT["degree",0.0174532925199433]], USAGE[ SCOPE["Horizontal component of 3D system."], AREA["World."], BBOX[-90,-180,90,180]], ID["EPSG",4326]]Data axis to CRS axis mapping: 2,1
Origin = (122.535508747779431,42.219513022077628)
Pixel Size = (0.000020860301472,-0.000020860301472)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 122.5355087, 42.2195130) (122d32' 7.83"E, 42d13'10.25"N)
Lower Left ( 122.5355087, 41.1881171) (122d32' 7.83"E, 41d11'17.22"N)
Upper Right ( 123.9040697, 42.2195130) (123d54'14.65"E, 42d13'10.25"N)
Lower Right ( 123.9040697, 41.1881171) (123d54'14.65"E, 41d11'17.22"N)
Center ( 123.2197892, 41.7038151) (123d13'11.24"E, 41d42'13.73"N)
Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray
Min=197.000 Max=831.000
Minimum=197.000, Maximum=831.000, Mean=324.914, StdDev=40.716
NoData Value=0
Metadata:
STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=831 STATISTICS_MEAN=324.91394936491 STATISTICS_MINIMUM=197 STATISTICS_STDDEV=40.71567095048 STATISTICS_VALID_PERCENT=68.91Band 2 Block=256x256 Type=UInt16, ColorInterp=Undefined
Min=151.000 Max=938.000
Minimum=151.000, Maximum=938.000, Mean=323.158, StdDev=54.129
NoData Value=0
Metadata:
STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=938 STATISTICS_MEAN=323.15773895484 STATISTICS_MINIMUM=151 STATISTICS_STDDEV=54.129489607214 STATISTICS_VALID_PERCENT=68.91Band 3 Block=256x256 Type=UInt16, ColorInterp=Undefined
Min=47.000 Max=1187.000
Minimum=47.000, Maximum=1187.000, Mean=321.103, StdDev=90.495
NoData Value=0
Metadata:
STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=1187 STATISTICS_MEAN=321.10253848467 STATISTICS_MINIMUM=47 STATISTICS_STDDEV=90.495081418363 STATISTICS_VALID_PERCENT=68.91Band 4 Block=256x256 Type=UInt16, ColorInterp=Undefined
NoData Value=0
ok, so you are definitely in one use case where multithreading is going to be the less efficient, because you use nearest neighbouring which isn't computing intensive at all, and because your dataset is uncompressed. On my Intel i7-10750H laptop, 6 cores hyper-threaded, hence 12 virtual CPUs, on Linux, I do however get close to a x2 speed-up by using multithreading (wall clock, but if you look at the 'user' time a lot more is burnt):
$ gdal_create in.tif -outsize 20000 20000 -bands 4 -ot uint16 -co tiled=yes
$ time gdaladdo -ro in.tif --config GDAL_NUM_THREADS ALL_CPUS
0...10...20...30...40...50...60...70...80...90...100 - done.
real 0m5,887s
user 0m35,110s
sys 0m11,790s
$ rm in.tif.ovr
$ time gdaladdo -ro in.tif
0...10...20...30...40...50...60...70...80...90...100 - done.
real 0m9,890s
user 0m7,505s
sys 0m2,373s
Perhaps multithreading on Windows is less efficient. In any case, the multittreading here will only be on the computational part of overviews. This shouldn't affect I/O patterns at all, hence I can't make any sense of a "may cause a 20% to 30% reduction in mechanical disk performance." statement.
Would be funny if it does, actually if is uncompressed is pretty bad for a mechanical drive.
HDDs are slower than a SSD one, which implies uncompressed data, will takes more time of read than a SSD one. Multithread also means, both threads can request data from the HDD.
HDD will read the data, but if two threads requested at the same time, what will happens is that the disk will try to read data from two places, so something will happens, the disk will try to read the full data in the min time, so one thread will finish, the other one later, where is the point? when the first thread read the data is more likely will also have partially read data from the second thread, so in theory, the read time for the first thread will be higher than not use multiprocessing, the read time of the second one will be smaller, but thinking there is more reading and writing that this process, the multiprocessing is more likely to increase the read time than slower it. (maybe you should include, at the same time, each process writing to the disk....)
Request non-continuous data, and long data to a HDD is a pretty bad idea, and a expensive one, try to avoid it. Compress the data as recommended above is a great idea, and better for HDDs, COG is a great set of TIFF options to optimize, you can try it.
Windows, also usually means NTFS, which means fragmentation, if the data is fragmented imagine what will need to do the disk + all I wrote above.
Also, as ppl already said here, multiprocessing has a cpu cost, in simple terms, if you process at speed v, n samples of data, with t threads, where the speed to create a thread is s, as a very simple way to see it the time to finish is v*n/t+t*s, so is not likely increase thread will do all faster, there is a point where the cost can exceed the 1 thread time, or as usual, tends to stabilize in a specific time, without extra performance increasing number of threads.
Ppl already said it, but is not like multiprocessing will always be beneficial, more process means a lot of costs, share or move data, the thread creating, requested data to devices, write data! found your sweet point :)