pyproj icon indicating copy to clipboard operation
pyproj copied to clipboard

Problem downloading grids on the fly

Open snowman2 opened this issue 9 months ago • 5 comments

Discussed in https://github.com/pyproj4/pyproj/discussions/1454

Originally posted by j-carson October 25, 2024 The following code used to work with pyproj==3.6.1

    WGS84 = "EPSG:4979"  # https://epsg.io/4979
    EGM96 = "EPSG:9707"  # https://epsg.io/9707

    tg = TransformerGroup(WGS84, EGM96)
    if tg.unavailable_operations:
        tg.download_grids(verbose=True)

    transformer = Transformer.from_crs(CRS(EGM96), CRS(WGS84), always_xy=True)

But with pyproj==3.7 I'm getting a no-op transform - the grid I tried to download isn't actually being used.

This bug only manifested itself in my CI system, which starts with a completely clean environment. If I look in the .local/share/proj directory in the build system, I see that the grid is there us_nga_egm96_15.tif, but if I look in my "messy" system where it wasn't failing, I have files.geojson in there as well.

I found the call to get_transform_grid_list() which puts the missing files.geojson there.

With that call added, the test still fails the first time I run it, but once both files are in place, the second time you run the program, the tests succeed.

How can I reliably download grids on the fly?

snowman2 avatar Feb 12 '25 03:02 snowman2

https://github.com/pyproj4/pyproj/discussions/1454#discussioncomment-12160165

After some debugging, it appears that it doesn't matter if the files.geojson file is there. You just need to start a new python session for it to work as expected.

The issue appears to be related to the change in https://github.com/pyproj4/pyproj/pull/1419.

To re-produce the issue, first clear out the user proj data directory contents.

Then, run this script:

import concurrent.futures

from pyproj.transformer import Transformer, TransformerGroup

WGS84 = "EPSG:4979"  # https://epsg.io/4979
EGM96 = "EPSG:9707"  # https://epsg.io/9707

tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
    tg.download_grids(verbose=True)

transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)

def transform_repr(idx):
    return str(Transformer.from_crs(EGM96, WGS84, always_xy=True))


with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    for result in executor.map(transform_repr, range(5)):
        print(result)

The output:

proj=noop ellps=GRS80
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called

The main thread is the only one with the issue. The other threads get a fresh context that has something reset. This tells me that PROJ is caching something in the context that needs to be cleared after downloading the files in the python session.

snowman2 avatar Feb 12 '25 03:02 snowman2

It appears that the context caches the files looked in the context here. It gets reset if you set the search paths code.

I verified that if you call set_data_dir it will reset it an the Transform is populated correctly:

from pyproj.transformer import Transformer, TransformerGroup
from pyproj.datadir import get_data_dir, set_data_dir

WGS84 = "EPSG:4979"  # https://epsg.io/4979
EGM96 = "EPSG:9707"  # https://epsg.io/9707

tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
    tg.download_grids(verbose=True)

set_data_dir(get_data_dir())
transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)
unavailable until proj_trans is called

snowman2 avatar Feb 12 '25 04:02 snowman2

@rouault, what are your thoughts on this behavior? Do you think that it would be helpful to add a method for the user to invalidate the cache on the context here? Or, alternatively have proj_download_file update the cache for a the downloaded grid?

snowman2 avatar Feb 12 '25 04:02 snowman2

Or, alternatively have proj_download_file update the cache for a the downloaded grid?

oh yes proj_download_file should definitely invalidate lookupedFiles. Please file a OSGeo/PROJ issue about that

rouault avatar Feb 12 '25 10:02 rouault

It appears that the context caches the files looked in the context here. It gets reset if you set the search paths code.

I verified that if you call set_data_dir it will reset it an the Transform is populated correctly:

Thanks for the work-around!

j-carson avatar Feb 17 '25 00:02 j-carson