384 optimize distance computation
Hello, I have added the function "distance_to_anomaly_gdal_ComputeProximity" to reduce the runtime for distance computation. The new function "distance_to_anomaly_gdal_ComputeProximity" is a C++ based API function and hence is faster as compared to "distance_to_anomaly" (which was using the point based scheme) and "distance_to_anomaly_gdal" (which was using the python API for distance computation) function. All the three produce the exact same result which is verified in the python notebook distance_to_anomaly.ipynb.
Below I have attached some graphs inorder showcase the reduction in time achieved by the new function.
POINT TO NOTE: I have kept the precommits off due to errors.
Thankyou.
I will do a more in-depth review later but first please add a test to check the implemented functions! See e.g. tests/raster_processing/test_distance_to_anomaly.py for inspiration on how to implement tests or other files in the tests/ directory. The tests are run by pytest. The purpose of the test(s) is to mainly check that the function works with some pre-defined input and the result is as expected (type is correct, value is sensible, etc.). I see you already added examples of use in the notebook but those are not checked automatically on Github actions so it does not replace actual tests.
@nialov , do you have time to review this sometime this month? It'd be better if you can, but I can too if needed.
I will try to review next week @nmaarnio though I can not promise it. Might get delayed to 7.10. due to work trip.
Based on a preliminary look, as this uses the GDAL API, I can foresee problems with Windows which will delay merging.
@okolekar The tests (All checks have failed) are not running successfully, see e.g.:
https://github.com/GispoCoding/eis_toolkit/actions/runs/10809648448/job/30251164407?pr=423
You can check the errors and try to see if you can get them fixed. I can help with this during review but as said before that might take time.
Hello to all, I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer. What I can do is :- I can add the function @pytest.mark.xfail( sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError) Which essentially skips the testing.
Regards! Omkar Kolekar
Hello to all, I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer. What I can do is :- I can add the function @pytest.mark.xfail( sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError) Which essentially skips the testing.
Regards! Omkar Kolekar
Do you use a Windows computer and have you installed eis_toolkit with conda, pip or poetry (or some other method)? If the problem is that Windows gdal library does not have the same bindings as linux, then the optimization is not available for Windows users. The target audience probably mostly uses Windows.
Hello to all, I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer. What I can do is :- I can add the function @pytest.mark.xfail( sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError) Which essentially skips the testing. Regards! Omkar Kolekar
Do you use a Windows computer and have you installed
eis_toolkitwithconda,piporpoetry(or some other method)? If the problem is that Windowsgdallibrary does not have the same bindings as linux, then the optimization is not available for Windows users. The target audience probably mostly uses Windows.
I do use a Windows machine with a conda environment but for development purposes, I have not installed eis_toolkit in this environment. If this still is a problem, I have an idea: we can create a C++ library/function to calculate the distance to anomaly and call this function in python program. The only issue would be to understand how gdal_ComputeProximity works. Warm Regards, Omkar
I think the primary question here is whythe eis_toolkit environment does not have _gdal_array importable in the Linux (and probably Windows environemnt). What version of gdal do you use? To check version:
➜ gdalinfo --version
GDAL 3.9.2, released 2024/08/13
You should try to get your code to work in the environment defined in environment.yaml or in the eis_toolkit environment installed with conda. See instructions/dev_setup_without_docker_with_conda.md and if it does not work in the environment, try to pinpoint why exactly, i.e., what is the difference between your working environment and environments of eis_toolkit.
Also I misspoke earlier about Windows difficulties, I was correlating tests failing with issues on Windows platform but the checks here on GitHub run on linux (Ubuntu) only currently. So I suppose the issue is the Linux environment which might be difficult for you to solve if you on Windows. I can take a look but it will sadly get delayed to 7.10. or onwards
Also I misspoke earlier about Windows difficulties, I was correlating tests failing with issues on Windows platform but the checks here on GitHub run on linux (Ubuntu) only currently. So I suppose the issue is the Linux environment which might be difficult for you to solve if you on Windows. I can take a look but it will sadly get delayed to 7.10. or onwards
I checked the GDAL version I use gdalinfo --version GDAL 3.6.2, released 2023/01/02 I do work on Windows OS. I need to check the possibility to work on Linux (Ubuntu).
This branch is based on a somewhat old version of eis_toolkit. Can you do a merge and push it. Run the following commands in the repository with your edits but make sure everything is committed first (no unstaged changes):
# Check that upstream points to GispoCoding/eis_toolkit:
git remote -v
# If it does not or it does not exist you can report back here. Otherwise, continue:
git fetch upstream
git merge upstream/master
# origin should point to okolekar/eis_toolkit
git push origin
You should not run into any conflicts but if the merge is not completed successfully and you get a dirty git status, you can do git merge --abort to abort and report back here.
The issue seems to be that gdal is not installed with numpy bindings when installed with poetry, disabling some raster behavior. I believe I have ran into this before. Your tests run successfully on both Linux and Windows when using conda to install the environment.
This branch is based on a somewhat old version of
eis_toolkit. Can you do a merge and push it. Run the following commands in the repository with your edits but make sure everything is committed first (no unstaged changes):# Check that upstream points to GispoCoding/eis_toolkit: git remote -v # If it does not or it does not exist you can report back here. Otherwise, continue: git fetch upstream git merge upstream/master # origin should point to okolekar/eis_toolkit git push originYou should not run into any conflicts but if the merge is not completed successfully and you get a dirty
git status, you can dogit merge --abortto abort and report back here.
My merge is completed without any conflicts.
I will continue this after 7.10. There might be ways to circumvent the poetry environment issue but I am going to guess that if the code is used with the current imports it cannot be run in the poetry environment but works on conda across platforms (and with pip).
The code in terms of functionality seems to be working well in tests and in the notebook. My comments mostly relate to style and trying to minimize code duplication, especially in tests. If you have the interest, you could try to refactor some of the code which is identical in functions
_distance_to_anomaly_gdal_ComputeProximityand_distance_to_anomalyso that same code is used only once.
To avoid repetition of code, I have created a function named '_validate_threshold_criteria' that contains the common code from both the functions ('_distance_to_anomaly_gdal_ComputeProximity' and '_distance_to_anomaly') and both the functions call '_validate_threshold_criteria' internally.
I suppose this is ready for review again? I will take a look in a week (or two).
I suppose this is ready for review again? I will take a look in a week (or two).
Yes, it is ready for review again. Thank you for the support. I have also taken care of precommits this time, as I have solved the issues with the precommits.
This PR needs to be merged this week if we want to include it in the released planned for Friday this week. If Nikolas is busy, I might jump in and do a review tomorrow so that testers will be able to run the new version before progress meet.
This PR needs to be merged this week if we want to include it in the released planned for Friday this week. If Nikolas is busy, I might jump in and do a review tomorrow so that testers will be able to run the new version before progress meet.
Sound okay. If there are any changes needed, I am available to modify them as soon as possible to ensure timely delivery of the code. Also, a point to note here is that the changes were mainly cosmetic.
Hi @okolekar . I only now realized you have been optimizing distance_to_anomaly here, notdistance_computation. The issue and PR name refer to distance_computation which is the vector processing tool that the original implementation of distance_computation was using under the hood. Did you use any time to think about optimizing distance_computation, or only distance_to_anomaly so far?
Either way, it's of course good that you have been optimizing this tool since it's used a lot. However, we direly need also distance_computation optimized since it's a commonly used tool in MPM. Optimization of distance_computation would also optimize vector_density.
When it comes to actual review of this PR, you should mark the failing test functions to fail if platform is not Windows, so like this:
@pytest.mark.xfail( sys.platform != "win32", reason="gdal_array available only on Windows.", raises=ModuleNotFoundError )
Also, I think we could comment out the old distance_to_anomaly_gdal tool if it is not used, the module has now a lot of tools that might confuse users.
Hi @okolekar . I only now realized you have been optimizing
distance_to_anomalyhere, notdistance_computation. The issue and PR name refer todistance_computationwhich is the vector processing tool that the original implementation ofdistance_computationwas using under the hood. Did you use any time to think about optimizingdistance_computation, or onlydistance_to_anomalyso far?Either way, it's of course good that you have been optimizing this tool since it's used a lot. However, we direly need also
distance_computationoptimized since it's a commonly used tool in MPM. Optimization ofdistance_computationwould also optimizevector_density.When it comes to actual review of this PR, you should mark the failing test functions to fail if platform is not Windows, so like this:
@pytest.mark.xfail( sys.platform != "win32", reason="gdal_array available only on Windows.", raises=ModuleNotFoundError )Also, I think we could comment out the olddistance_to_anomaly_gdaltool if it is not used, the module has now a lot of tools that might confuse users.
Hi @nmaarnio, I was asked to optimize the distance_to_anomaly only. However, I can start with that tool as well. But it will take some time.
Regarding the PR, will upload the suggested changes in an hour or two. Thanks!
Okay. I think there has been some miscommunication at some point. Anyway, I will rename this PR and relink the issue as this does not concern distance_computation
@nmaarnio I have made the requested changes. Now I will shift my focus to optimize the Distance_computation in the vector processing tool.
And sorry @nialov for bypassing you in the review process. There is just some pressure to get this, among some other features/improvements, merged before the next progress meeting. You are of course free to review the implementation when you have time and make improvement suggestions even after this version has been merged.
Hey, no problem @nmaarnio . The pull request was quite fine already with tests confirming the functionality so second review would not have been too important.
Also to note that the gdal_array error does not work due to poetry, independent of platform. So it would work if the test installed with pip on linux. Of course poetry is only used on Linux so disabling it on that platform works