gdal icon indicating copy to clipboard operation
gdal copied to clipboard

gdal2tiles too slow while add transparent parameter.

Open sendreams opened this issue 1 year ago • 13 comments

What is the bug?

During the use of gdal2tiles, I encountered a few issues:

  1. After adding the -a parameter (to set the transparency color), the process runs very slowly.
gdal2tiles D:\gis\data\hbsd\jsx\2996-UTM50.vrt D:\gis\tile\vrt -z 12 --xyz -w none -a 0
  1. If I use nested VRTs, the process is also very slow even without adding the -a parameter.
1732032873007

the snapshot source image 1732032925654

after set nodata value 1732032997090

the vrt file info

Driver: VRT/Virtual Raster
Files: D:\gis\data\hbsd\jsx\2996-UTM50.vrt
       D:\gis\data\hbsd\jsx\2996-UTM50.tif
Size is 147808, 50153
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (115.485182239101078,30.076076451982868)
Pixel Size = (0.000001022178749,-0.000001022178749)
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( 115.4851822,  30.0760765) (115d29' 6.66"E, 30d 4'33.88"N)
Lower Left  ( 115.4851822,  30.0248111) (115d29' 6.66"E, 30d 1'29.32"N)
Upper Right ( 115.6362684,  30.0760765) (115d38'10.57"E, 30d 4'33.88"N)
Lower Right ( 115.6362684,  30.0248111) (115d38'10.57"E, 30d 1'29.32"N)
Center      ( 115.5607253,  30.0504438) (115d33'38.61"E, 30d 3' 1.60"N)
Band 1 Block=512x128 Type=Byte, ColorInterp=Red
  Min=0.000 Max=255.000
  Minimum=0.000, Maximum=255.000, Mean=3.642, StdDev=23.192
  Overviews: 73904x25077, 36952x12538, 18476x6269, 9239x3135, 4620x1568, 2310x784, 1155x392, 578x196, 289x98
  Metadata:
    STATISTICS_COVARIANCES=537.8738799377072,472.54864919093,367.2624707721893
    STATISTICS_MAXIMUM=255
    STATISTICS_MEAN=3.6419380563056
    STATISTICS_MINIMUM=0
    STATISTICS_SKIPFACTORX=1
    STATISTICS_SKIPFACTORY=1
    STATISTICS_STDDEV=23.192108139143
Band 2 Block=512x128 Type=Byte, ColorInterp=Green
  Min=0.000 Max=255.000
  Minimum=0.000, Maximum=255.000, Mean=3.219, StdDev=20.444
  Overviews: 73904x25077, 36952x12538, 18476x6269, 9239x3135, 4620x1568, 2310x784, 1155x392, 578x196, 289x98
  Metadata:
    STATISTICS_COVARIANCES=472.54864919093,417.9571549986692,323.5913342639855
    STATISTICS_MAXIMUM=255
    STATISTICS_MEAN=3.2194881108267
    STATISTICS_MINIMUM=0
    STATISTICS_SKIPFACTORX=1
    STATISTICS_SKIPFACTORY=1
    STATISTICS_STDDEV=20.444000464651
Band 3 Block=512x128 Type=Byte, ColorInterp=Blue
  Min=0.000 Max=255.000
  Minimum=0.000, Maximum=255.000, Mean=2.483, StdDev=15.952
  Overviews: 73904x25077, 36952x12538, 18476x6269, 9239x3135, 4620x1568, 2310x784, 1155x392, 578x196, 289x98
  Metadata:
    STATISTICS_COVARIANCES=367.2624707721893,323.5913342639855,254.459050195018
    STATISTICS_MAXIMUM=255
    STATISTICS_MEAN=2.4828870904195
    STATISTICS_MINIMUM=0
    STATISTICS_SKIPFACTORX=1
    STATISTICS_SKIPFACTORY=1
    STATISTICS_STDDEV=15.951772634883

Steps to reproduce the issue

just add -a 0 to gdal2tiles parameter.

Versions and provenance

gdalversion: 3.10.0

Additional context

No response

sendreams avatar Nov 19 '24 16:11 sendreams

Without the -a parameter, it takes 10 seconds, but after adding it, 30 minutes have passed and it's still not finished.

sendreams avatar Nov 19 '24 16:11 sendreams

I suppose that your source data is somehow special. My test with a 12000x12000 RGB image took about 210 seconds to run with and without -a 0. Without test data I cannot say what is special. Is it that the raster contains mostly zero pixels, or is the reason in the VRT that you use in between.

jratike80 avatar Nov 19 '24 19:11 jratike80

@jratike80 i have cliped a tiny data for testing. sended a email to: [email protected]

sendreams avatar Nov 20 '24 00:11 sendreams

@jratike80 do you received the testing data?

sendreams avatar Nov 21 '24 08:11 sendreams

I do not see such mail on the list https://lists.osgeo.org/pipermail/gdal-dev/2024-November/date.html. Mailing list is not so good for attachents. If the test data is small you can attach it to this GitHub issue. Or put it into some download service and share a link. Or as a last option send private mail to me.

jratike80 avatar Nov 21 '24 08:11 jratike80

I have received your test data and I can confirm, that adding -a 0 makes gdal2tiles very much slower. Nothing more to say yet.

jratike80 avatar Nov 23 '24 16:11 jratike80

I believe that the reason for the slowliness in this case comes from gdal2tiles reading the nodata always from the full resolution data.

  • the source image has a size of 145000, 55000 with a pixel size of 0.1m
  • source image has overviews 72500x27500, 36250x13750, 18125x6875, 9063x3438, 4532x1719, 2266x860, 1133x430, 567x215, 284x108
  • zoom level 12 is asked from gdal2tiles, that means the pixel size of 38m
  • gdal2tiles without nodata switch is hitting the right overview level and is very fast
  • when gdal2tiles is run with the -a 0 switch it gets very much faster and I believe it is because the full resolution data is read for finding the nodata.

Here are some commands for testing if my theory is correct.

Create a 50000x50000 sized image filled with zeroes.

gdal_create -of GTiff -co tiled=yes  -bands 3 -burn 0 -burn 0 -burn 0 -a_srs epsg:3067 -a_ullr 494000.000 7014000.000 500000.000 7008000.000 -outsize 60000 60000 zero3067.tif   

Run gdal2tiles command for the image. This is slow because there are no overviews so gdal2tiles must read the full resolution data. This took about 20 minutes on my computer. gdal2tiles zero3067.tif test1 -z 12 --xyz -w none Add overviews gdaladdo zero3067.tif and gdal2tiles will be fast. Took about 6 seconds in my test. Then run again with -a 0 and gdal2tiles will spend again some 20 minutes.

I did not test how gdal2tiles behaves without setting the fized zoom level but I think that there should be no speed difference in creating the base tiles because the full resolution data is read in both cases. If the next zoom levels with bigger pixel size are constructed from the previous tiles there should be no speed difference either. So maybe the issue is restricted to the use case when gdal2tiles is asked to create subsampled zoom level directly from the source data.

I am not sure if it would be safe to read also the nodata from the overviews. Overviews can be resampled with methods like average or cubic and compressed with lossy methods like JPEG and the nodata mask may not be accurate any more. But if user wants to get z12 directly from data with a native resolution that is close to z20 the accuracy is probably not the main goal.

jratike80 avatar Nov 24 '24 13:11 jratike80

Two more tests. Source file has no overviews and no nodata set into the metadata. Gdal2tiles is run without -z option so it starts from the base tiles, in this case from zoom level 19.

gdal2tiles zero3067.tif test1 --xyz -w none
0...10...20...30...40...50...60...70...80...90...100 - done in 00:28:04.
0...10...20...30...40...50...60...70...80...90...100 - done in 00:03:04.

gdal2tiles zero3067.tif test2 --xyz -w none -a 0
0...10...20...30...40...50...60...70...80...90...100 - done in 00:41:52.
0...10...20...30...40...50...60...70...80...90...100 - done in 00:05:47.

It looks like gdal2tiles really needs to do 50% more work when it has to find out the nodata pixels. I cannot say if that is something that could be avoided with more clever coding. I fear I have reached my limits with analysing this issue now.

jratike80 avatar Nov 24 '24 20:11 jratike80

I encountered a phenomenon during my actual operations that I’d like to report: after setting the -a 0 for a single TIF file, gdal2tiles becomes extremely slow. Besides this issue, in practice, if a nested VRT is used, gdal2tiles is also very slow, even without adding the -a parameter.

a nested vrt like this

parent.vrt
 --a.vrt
 --b.vrt
 --c.vrt
 --d.tif

sendreams avatar Nov 25 '24 02:11 sendreams

@jratike80 hi, Is there any progress on this issue?

sendreams avatar Mar 05 '25 10:03 sendreams

Hi @sendreams , I am a GDAL user who can't program anything so certainly never any progress if it depends on me.

jratike80 avatar Mar 05 '25 14:03 jratike80

@jratike80 thanks, Do you know how to move this process forward?

Hi @sendreams , I am a GDAL user who can't program anything so certainly never any progress if it depends on me.

sendreams avatar Mar 05 '25 14:03 sendreams

There seems to be 514 open GDAL issues today https://github.com/OSGeo/gdal/issues so it is easy for the developers to pick some tickets for their amusement, if they do not happen to have any contracted work in a queue or some own great ideas about coding something new and exciting. I can imagine 4 alternatives:

  1. Convince that this ticket is more important to be fixed than the others, or more fun than to fix than to code and make some new ideas to come true.
  2. Fix the issue yourself if you only can and create a pull request.
  3. Try to hire someone to fix the issue for you.

Option 2. does not suit me but I have used options 1. and 3. with success. I think that I have also issues in a queue because they are not critical enough for the GDAL community, and not even for me so much that I would like to pay for the fix.

jratike80 avatar Mar 05 '25 15:03 jratike80

closing as superseded by gdal raster tile

rouault avatar Jul 10 '25 20:07 rouault