[POC] [STORY] Makefile
Context
As part of their work, the CNES + sub-contractors developers work from a basic python env. For ease of implementation some tools use a Makefile. We therefore suggest copying this method to xdem.
This requires:
- [ ] that the setup.cfg and setup.py files are organized as in demcompare
- [ ] reflect on the use of requirements.txt
- [ ] some opencv dependencies may be missing
- [ ] Copy the demcompare makefile: https://gitlab.cnes.fr/3d/demcompare/-/blob/master/Makefile
- [ ] `PYTHON_VERSION_MIN fixed at 3.9
- [ ] replace "demcompare" with "xdem"
- [ ] remove dependencies related to notebooks, coverage, remove pylint, docs
- [ ] add xdem's pre-commits dependencies
3d estimation
Here are some remarks:
- Makefile depends on several sub topics and is just a helpful wrapper for dev:
- python, pip, packaging if it is ok to use python pip install as the basic for developers (and not conda/mamba).
- linting #550
- test pytest (coverage is linked also and why remove it ? can be useful as information)
- documentation generation (i would not remove it from the list, only use the current workflow to generate docs)
- ok for notebook removal for now
- pre-commit to be decided (ruff on commit ? have to choose strategy)
- clean and linked .gitignore is very important also: make clean must remove perfectly all dev parts to be able to start over (very useful when in a not stable place, especially for support for other dev and contribution)
Maybe add or link to adequate issues for subtopics ? A first Makefile can be done without all sub topics and each sub topic can be added little by little to have a full dev framework
I think before Makefile, the pip install and packaging must be converged, especially the mix with mamba/conda. I find the link between requirements.txt and environment.yaml not clear and have questions for maintenance. Ok for an another issue (or have not found one if it exists) on that ? For a clean and a "pure" python project, it is also possible to simplify setup.py and setup.cfg and have only a pyproject.toml now. To be discussed and be sure of the dev ways to go.
I would separate the pure python packaging with its dependencies (and not fixed one) from the dependencies from conda/mamba (and fixed one if needed to keep users stable for this installation). Fixed dependencies can be really problematic in a mix environment with other programs with same dependencies. My 2cts, this last part would be more in a "packaging" issue... but for now, i put it here:
requirements.txt (generated from environment.yaml, can we change that ?)
geopandas>=0.12.0 -> ok
numba==0.* -> why == ?
numpy==1.* -> can be complex with numpy 2 now
matplotlib==3.* -> idem
pyproj>=3.4,<4 -> less than 4 can be problematic in a complex mixed environment
rasterio>=1.3,<2 > idem for rasterio
scipy>=1.0,<1.13 > idem
tqdm
scikit-image==0.* > idem
scikit-gstat>=1.0,<1.1 > idem
geoutils==0.1.8 > idem
pip -> why pip in dependency, not useful i think
Some additional points:
- I see you identified the delicate OpenCV dependency @adebardo. The good news here is that it should be removed soon altogether (affine transformation is done in NumPy since #531, and ICP will be moved to NumPy soon too #483; I'm working on this in the next weeks). OpenCV is a heavy dependency to have (even if optional), which created a lot of hassle when solving environments in the past. We'll be glad to be rid of that.
Regarding your interrogations on the requirements.txt and the content of the environment file, here's the logic behind it:
-
For the generation of
requirements.txt: It is inspired by the way Pandas deals with Pip+Conda: https://github.com/pandas-dev/pandas/blob/main/scripts/generate_pip_deps_from_conda.py. It allows to keep a consistent environment for both package managers based on a single file (environment.yml). We additionally have checks that verify thatdev-environment.ymlmatchesenvironment.yml. See https://github.com/GlacioHack/xdem/pull/429 for details. We did a lot of digging to ensure things were up-to-date (especially on thesetuptoolsside), but are not experts in this. Happy to modify this if you know of other ways to keep the environment management consistent between Pip and Conda! :slightly_smiling_face: - For the pinning of major versions: We pin the least amount possible (usually there's a reason due to incompatibility or temporary bug fix going on in those packages, such as Rasterio>1.3 is incompatible; or Scipy<1.13 because of a breaking bug fix in SciKit-GStat not yet released: https://github.com/GlacioHack/xdem/issues/509). We learned recently that it is a good practice to always pin a max only for the next major version of main dependencies for backwards-compatibility. The reason behind this is that major versions (so just to be clear only X.0.0, neither 0.X.0 or 0.0.X) are synonym with breaking changes for nearly all packages. Pinning them is not such a big constraint on the environment because major releases are not frequent (for instance, NumPy/Pandas happen every 5-10 years, other packages every couple years), but they are crucial for long-term working of older package versions. For example, the recent NumPy 2.0 release (and Pandas 2.0 a couple years ago) broke a large part of packages in the Python ecosystem exactly because of this (no pinning of the upper version in most packages). The first thought for most packages is that "we can easily either update the code or pin the max version, and push a new release which supports it". That works for the latest release, but not for the older versions that also all depended on NumPy 1.X and break with NumPy 2.0. Their environment was not associated to any upper constraint, and that might make those older versions break forever, which would be a big problem for reproducibility (let's say a study used v1.5 of our package but NumPy 2.0 came out and we updated for it only in 1.6, then all versions before 1.5 will break for users if we did not pin NumPy).
We made this choice based in great part on this nice article explaining the pinning of max versions here: https://iscinumpy.dev/post/bound-version-constraints/, and another blog post explaining this in relation to NumPy 2.0 (and how to upgrade code with Ruff): https://pythonspeed.com/articles/numpy-2/. The former especially insist about the negative effects of pinning max versions in general, but does specify an exception for major releases, or in general releases likely to trigger breaking changes: "Only add a cap if a dependency is known to be incompatible or there is a high (>75%) chance of it being incompatible in its next release".
We're not experts in this either, so curious of your experience, and happy to adapt things! The main point of worry for us here is to not only have the latest release working, but keep older versions working as long as possible (for older operational workflows, reproducing published studies, etc).
And a final note to answer another interrogation: We use the syntax ==X.* for several dependencies as it's equivalent to >=X.0,<X+1, which pins the next major version + has the minimum major version (required for compatibility). So this syntax is used in the case where the min version and max versions are just 1 away.
@adebardo Can close this one from https://github.com/GlacioHack/xdem/pull/630, or is there still stuff left to do?
We can close and improve it later