pyproj
pyproj copied to clipboard
Problem downloading grids on the fly
Discussed in https://github.com/pyproj4/pyproj/discussions/1454
Originally posted by j-carson October 25, 2024 The following code used to work with pyproj==3.6.1
WGS84 = "EPSG:4979" # https://epsg.io/4979
EGM96 = "EPSG:9707" # https://epsg.io/9707
tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
tg.download_grids(verbose=True)
transformer = Transformer.from_crs(CRS(EGM96), CRS(WGS84), always_xy=True)
But with pyproj==3.7 I'm getting a no-op transform - the grid I tried to download isn't actually being used.
This bug only manifested itself in my CI system, which starts with a completely clean environment. If I look in the .local/share/proj directory in the build system, I see that the grid is there us_nga_egm96_15.tif, but if I look in my "messy" system where it wasn't failing, I have files.geojson in there as well.
I found the call to get_transform_grid_list() which puts the missing files.geojson there.
With that call added, the test still fails the first time I run it, but once both files are in place, the second time you run the program, the tests succeed.
How can I reliably download grids on the fly?
https://github.com/pyproj4/pyproj/discussions/1454#discussioncomment-12160165
After some debugging, it appears that it doesn't matter if the files.geojson file is there. You just need to start a new python session for it to work as expected.
The issue appears to be related to the change in https://github.com/pyproj4/pyproj/pull/1419.
To re-produce the issue, first clear out the user proj data directory contents.
Then, run this script:
import concurrent.futures
from pyproj.transformer import Transformer, TransformerGroup
WGS84 = "EPSG:4979" # https://epsg.io/4979
EGM96 = "EPSG:9707" # https://epsg.io/9707
tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
tg.download_grids(verbose=True)
transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)
def transform_repr(idx):
return str(Transformer.from_crs(EGM96, WGS84, always_xy=True))
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
for result in executor.map(transform_repr, range(5)):
print(result)
The output:
proj=noop ellps=GRS80
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
The main thread is the only one with the issue. The other threads get a fresh context that has something reset. This tells me that PROJ is caching something in the context that needs to be cleared after downloading the files in the python session.
It appears that the context caches the files looked in the context here. It gets reset if you set the search paths code.
I verified that if you call set_data_dir it will reset it an the Transform is populated correctly:
from pyproj.transformer import Transformer, TransformerGroup
from pyproj.datadir import get_data_dir, set_data_dir
WGS84 = "EPSG:4979" # https://epsg.io/4979
EGM96 = "EPSG:9707" # https://epsg.io/9707
tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
tg.download_grids(verbose=True)
set_data_dir(get_data_dir())
transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)
unavailable until proj_trans is called
@rouault, what are your thoughts on this behavior? Do you think that it would be helpful to add a method for the user to invalidate the cache on the context here? Or, alternatively have proj_download_file update the cache for a the downloaded grid?
Or, alternatively have
proj_download_fileupdate the cache for a the downloaded grid?
oh yes proj_download_file should definitely invalidate lookupedFiles. Please file a OSGeo/PROJ issue about that