eis_toolkit icon indicating copy to clipboard operation
eis_toolkit copied to clipboard

Issue with `distance_to_anomaly_gdal` and `gdal_array`

Open nmaarnio opened this issue 1 year ago • 13 comments

At least one Windows user has reported that gdal_array was not available for them, so they could not run the optimized distance_to_anomaly tool. @okolekar , how did you set up your development environment / did you do anything specific to have gdal_array available for you?

Before we can be sure that Windows users are able to run distance_to_anomaly_gdal without issues, we can't make it the default/automatically selected version for EIS QGIS Plugins users that have Windows.

nmaarnio avatar Oct 23 '24 14:10 nmaarnio

At least one Windows user has reported that gdal_array was not available for them, so they could not run the optimized distance_to_anomaly tool. @okolekar , how did you set up your development environment / did you do anything specific to have gdal_array available for you?

Before we can be sure that Windows users are able to run distance_to_anomaly_gdal without issues, we can't make it the default/automatically selected version for EIS QGIS Plugins users that have Windows.

There are issues associated with GDAL >= 3.9, Python >= 3.9 and NumPy 2.0.

It is recommended to use 'GDAL 3.6.2, released 2023/01/02' with python 3.9.18.

Also, In order to enable numpy-based raster support, libgdal and its development headers must be installed as well as the Python packages numpy, setuptools, and wheel. To install the Python dependencies and build numpy-based raster support use the following commands for pip users:

pip install numpy>1.0.0 wheel setuptools>=67 pip install gdal[numpy]=="$(gdal-config --version).

For Conda users: - as GDAL can be quite complex to build and install, particularly on Windows and MacOS. Pre built binaries are provided for the conda system. It is recommended to use the following command

conda install -c conda-forge gdal

If the problem still persits then try using the following command

conda install -c conda-forge gdal numpy setuptools wheel

okolekar avatar Oct 24 '24 09:10 okolekar

In addition to this if the issue still persists then: -

I would like to know if the issue raised is as below: -

Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.12/dist-packages/osgeo/gdal_array.py", line 10, in <module> from . import _gdal_array ImportError: cannot import name '_gdal_array' from 'osgeo' (/usr/local/lib/python3.12/dist-packages/osgeo/__init__.py)

If yes then the issue is because the pip/conda is reusing a cached GDAL installation. Use the following command to make sure that the correct version is installed and used

For Conda users conda install -c conda-forge gdal numpy

For pip users pip install --no-cache --force-reinstall gdal[numpy]=="$(gdal-config --version).*"

okolekar avatar Oct 24 '24 10:10 okolekar

Hi @okolekar , sorry I haven't had time to think about this for a while.

I don't have Windows myself so it's difficult for me to test if these methods work. However, users should be able to install ideally everything simply with pip install eis_toolkit, without additional configurations. Do you think that can be achieved even if we use gdal_array?

nmaarnio avatar Nov 05 '24 15:11 nmaarnio

Hi @nmaarnio, Sorry for this late reply. Ideally conda platform provides everything and it does not rely on the user to take any additional steps. However, with pip it is a bit different as pip does not handle everything like conda. But I think a configuration file should be able to orchestrate everything. I have Windows 10 Pro and on this PC it works with problems on Conda environment. I will try to install the same with pip and let you know.

okolekar avatar Nov 13 '24 07:11 okolekar

I see that for EIS Toolkit we need to setup a Conda environment. So this issue should not surface in theory, because Conda takes care of all the requirements.

okolekar avatar Nov 13 '24 09:11 okolekar

Hi @okolekar , unfortunately right now conda environments are not as good as just using venv + pip. All users might not have / want to use conda and recently we discovered a license issue that relates to tensorflow and default channel of conda. So we should prioritize that everything works well with venv and pip

nmaarnio avatar Nov 13 '24 10:11 nmaarnio

The good news is that @msorvoja managed to optimize distance_computation using Numba, at least to some extent. We might not reach the same speeds as with gdal, but this at least relieves the pressure to get this optimization up and running for everybody.

nmaarnio avatar Nov 13 '24 10:11 nmaarnio

Hi @okolekar , unfortunately right now conda environments are not as good as just using venv + pip. All users might not have / want to use conda and recently we discovered a license issue that relates to tensorflow and default channel of conda. So we should prioritize that everything works well with venv and pip

I will try to work with venv. And let you know as soon as I am done.

okolekar avatar Nov 13 '24 12:11 okolekar

Hi @nmaarnio , I tried to install the gdal library, but the library seems to be a bit stubborn and is not available for the venv + pip users. Unfortunately, conda is the only way to work with. I am checking if it works with OSGeo4W.

okolekar avatar Nov 13 '24 14:11 okolekar

you need conda for gdal in windows, especially for the average user you are targeting

RichardScottOZ avatar Nov 13 '24 17:11 RichardScottOZ

you need conda for gdal in windows, especially for the average user you are targeting

Yes I tried a lot yesterday to find a way with pip but it seems it is not available even a wheel file or a precompiled file is not available. There was a precompiled file made available unofficially by Christoph Gohlke but it simply does not exist any more.

okolekar avatar Nov 14 '24 09:11 okolekar

Okay, this is unfortunate. We can still offer the optimized tool that uses gdal_array as an option for users of EIS Toolkit, but cannot then use it in the CLI function and for plugin users.

The optimization of distance_computation is close to completion, so we should check how does distance_to_anomaly that uses the optimized distance_computation in the background compare to the gdal_array version of distance_to_anomaly. If the performance is close enough it's good, but if not, then let's explore other ways to optimize distance_to_anomaly. A similar Numba-compatible version could be created I believe.

nmaarnio avatar Nov 14 '24 09:11 nmaarnio

Hi @okolekar , I am working on this issue again and after some testing I discovered that if Numpy is installed before GDAL, the version of distance_to_anomaly you implemented that uses GDAL seems to work. However, I also discovered that the original distance_to_anomaly_gdal Nikolas implemented a long time ago that calls gdal_proximity from osgeo_utils seems to run without issues or any additional installation steps on Ubuntu and Windows for me and is fast as your implementation – which is still a lot faster than the one that uses Numba right now.

I think we can still include a try-except structure to fallback to the slower version just in case the gdal version does not work for everyone, but otherwise I'll proceed to make all the distance tools to use this version to be as fast as possible. I can tag you as a reviewer when I'm done if you'd have time to take a look and test on your machine

nmaarnio avatar Mar 19 '25 13:03 nmaarnio