gempy icon indicating copy to clipboard operation
gempy copied to clipboard

Consider how to manage versions of dependencies

Open agzimmerman opened this issue 4 years ago • 32 comments

In the past few months, there have routinely been changes in dependencies which break our continuous integration process.

The same changes usually break releases in the same way, since we do not freeze the versions of most dependencies in releases.

In general, we want to continue using the latest versions of dependencies to benefit from their new features and bug fixes. This is also how we best contribute to the ecosystem as a whole. Therefore, during development and for the automated testing performed by Travis for continuous integration, it makes sense to keep testing against the latest versions of dependencies so that we are always aware of breaking changes and can address them in a timely fashion.

On the other hand, we do not want to spend our time fixing new bugs arising in releases from changes in dependencies, especially e.g. in the case of #502 where there does not appear to be any mistake on the GemPy side, but rather an internal bug in pandas.

So far, since we have not frozen the versions of many dependencies when we do pip releases, it is very likely that the releases will break.

@Leguark and I discussed this. For now, we think we should use pip freeze to require the versions of dependencies that we actually install and test when making releases.

A downside is that users are then essentially expected to have a Python environment specific for GemPy, which could be interpreted as a GemPy problem rather than a Python ecosystem problem.

In any case, I think @Leguark knows what to do for now and maybe we can think of a better long term solution later.

agzimmerman avatar Aug 12 '20 07:08 agzimmerman

This relates to #475

AlexanderJuestel avatar Aug 12 '20 08:08 AlexanderJuestel

This relates to #494

AlexanderJuestel avatar Aug 12 '20 08:08 AlexanderJuestel

@alex-schaaf do you foresee any other pro's/con's to this approach?

agzimmerman avatar Aug 18 '20 08:08 agzimmerman

A downside is that users are then essentially expected to have a Python environment specific for GemPy, which could be interpreted as a GemPy problem rather than a Python ecosystem problem.

Along this line of thought, @alex-schaaf mentioned that pipenv might solve part of our problem.

agzimmerman avatar Aug 18 '20 08:08 agzimmerman

"Managing a requirements.txt file can be problematic, so Pipenv uses Pipfile and Pipfile.lock to separate abstract dependency declarations from the last tested combination." - pipenv

Yep this seems good.

agzimmerman avatar Aug 18 '20 08:08 agzimmerman

If you freeze the requirements and pin everything, you will most likely not be able to install GemPy with anything else at all. The requirements should be as flexible as possible, and only known versions should be excluded, e.g., via !=, or >=, <=. Pinning dependencies with == will bring you into dependency hell with any other package, and will require that users have a specific environment just for GemPy...

prisae avatar Aug 18 '20 08:08 prisae

If you freeze the requirements and pin everything, you will most likely not be able to install GemPy with anything else at all. The requirements should be as flexible as possible, and only known versions should be excluded, e.g., via !=, or >=, <=. Pinning dependencies with == will bring you into dependency hell with any other package, and will require that users have a specific environment just for GemPy...

Yes this was exactly my worry. We will take your advice to at least only specify <= for now.

agzimmerman avatar Aug 18 '20 08:08 agzimmerman

Mostly I wasn't sure yet how to reconcile this with our testing process, but I think we will solve that with pipenv and a general cleanup of how we specify the dependencies (#505).

agzimmerman avatar Aug 18 '20 08:08 agzimmerman

(You use miniconda anyway on Travis, right? So I would use conda environments instead of pipenv, but that is just my preference.)

prisae avatar Aug 18 '20 08:08 prisae

(You use miniconda anyway on Travis, right? So I would use conda environments instead of pipenv, but that is just my preference.)

can we use the conda environment for the pip installation? Don't we need both?

Leguark avatar Aug 18 '20 08:08 Leguark

What do you mean with the "pip installation"? On Travis? Or if a user "pip installs"?

prisae avatar Aug 18 '20 08:08 prisae

If a user uses pip install

Leguark avatar Aug 18 '20 08:08 Leguark

Nope, I guess not. But do you want to create a custom pipenv/conda env when installing?

prisae avatar Aug 18 '20 08:08 prisae

very much agree with @prisae , pinning to exact versions is all kinds of bad news. I think a well done setup.py file/requirements.txt file or a conda environment should be completely sufficient for this problem, if not that points to a "code smell" issue where there are dependencies of gempy that are not kept up to date (e.g. spherecluster). I think I even posted an issue for that project or their conda-forge feedstock for them to update to a more recent sklearn version.

By depending on pandas however, due to the rapid pace of development on that project GemPy needs to keep up to date on what users generally will want to use (reasonably recent scipy stack + jupyter). I would say it would be worth examining what dependencies GemPy has, and seeing what can be reasonably removed.

AndrewAnnex avatar Aug 19 '20 14:08 AndrewAnnex

@alex-schaaf and I talked this over and are going to proceed as follows:

  • Update requirements.txt, dev-requirements.txt, and optional-requirements.txt. dev-requirements.txt will only list packages required for Travis's testing which are not already included in requirements.txt. Regarding versions of these dependencies, use <= restrictions where necessary.
  • Clean up the .travis.yml, primarily the install section. Try to only use pip and not conda. Install dependencies from the requirements files only. This doesn't mean that people can't use conda, but Travis shouldn't have to use both.

agzimmerman avatar Sep 22 '20 11:09 agzimmerman

Also there are some specific versions of packages that we should exclude in the requirements files:

  • pandas!=1.0.5 in requirements.txt (#502)
  • pyvista!=0.25 in dev-requirements.txt (#491)

I made the distinction between the two requirements files here because #491 was only ever reproducible on Travis, not when running locally.

agzimmerman avatar Sep 22 '20 12:09 agzimmerman

As a first test I've simplified the .travis.yml as follows on branch fix_506:

language: python

branches:
  only:
    - fix_506

python:
  - '3.7'  # probably should test all common py3 versions

before_install: # configure a headless display
  - git clone --depth 1 git://github.com/pyvista/gl-ci-helpers.git
  - source ./gl-ci-helpers/travis/setup_headless_display.sh

install:
  - pip install -r requirements.txt
  - pip install -r dev-requirements.txt

sctipt:
  - pytest -v

The result is only one test failing 🤩

test/test_core/test_grids/test_topography.py::test_real_grid_ales FAILED [ 59%]

with the error message

test/test_core/test_grids/test_topography.py:44: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

gempy/plot/decorators.py:108: in pst

    topography = func(*args, **kwargs)

gempy/core/model.py:401: in set_topography

    self._grid.create_topography(source, **kwargs)

gempy/core/data.py:183: in create_topography

    self.topography.load_from_gdal(filepath)

gempy/core/grid_modules/topography.py:91: in load_from_gdal

    dem = LoadDEMGDAL(filepath, extent=self.extent)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <gempy.core.grid_modules.create_topography.LoadDEMGDAL object at 0x7fe7c755ce80>

path_dem = '/home/travis/build/cgre-aachen/gempy/test/test_core/test_grids/../../input_data/_cropped_DEM_coarse.tif'

grid = None

extent = array([ 7.29550e+05,  7.51500e+05,  1.91350e+06,  1.92365e+06,

       -5.00000e+01,  8.00000e+02])

delete_temp = True

    def __init__(self, path_dem, grid=None, extent=None, delete_temp=True):

        """

        Args:

            path_dem: path where dem is stored. file format: GDAL raster formats

            if grid: cropped to geomodel extent00m

        """

        if GDAL_IMPORT == False:

>           raise ImportError('Gdal package is not installed. No support for raster formats.')

E           ImportError: Gdal package is not installed. No support for raster formats.

gempy/core/grid_modules/create_topography.py:35: ImportError

which refers to gdal not being installed.

It skipped all tests in test_vista.py, which I guess is supposed to happen 🤷‍♂️

@Leguark: through which installation command were you installing the gdal requirement? Also this looks like custom error handling in gempy/core/grid_modules/create_topography.py for the gdal import - so why crash out hard and not just return?

alex-schaaf avatar Sep 23 '20 09:09 alex-schaaf

gdal is installed in master's .travis.yml with

  - conda install -c conda-forge python-graphviz emg3d==0.11
    discretize==0.4.13 pip matplotlib ipython six pytest pytest-cov
    pytest-mock gdal pandas==1.0.5 seaborn>=0.9 qgrid sphinx-gallery
    ipywidgets pyevtk arviz dataclasses scikit-image>=0.17
    recommonmark networkx panel setuptools mkl-service

I'll have to dig up some old build logs to see if the test_vista.py tests were always being skipped. I don't see why that would be the intended behavior.

Also as long as I'm here:

  • There's a typo in the new .travis.yml. "sctipt" is written instead of "script". Somehow Travis doesn't seem to mind this.
  • I think we'll still want the deployment check.

agzimmerman avatar Sep 23 '20 10:09 agzimmerman

It skipped all tests in test_vista.py, which I guess is supposed to happen 🤷‍♂️

Yeah that's what's been happening on master so far as well. Maybe just raise a new issue about it so that it doesn't distract us from the rest of this cleanup.

agzimmerman avatar Sep 23 '20 10:09 agzimmerman

It skipped all tests in test_vista.py, which I guess is supposed to happen 🤷‍♂️

Yeah that's what's been happening on master so far as well. Maybe just raise a new issue about it so that it doesn't distract us from the rest of this cleanup.

Then again, I'm really unhappy with us not properly testing our PyVista dependency as part of this. So maybe we also fix this now.

agzimmerman avatar Sep 23 '20 10:09 agzimmerman

Alright so test_vista.py has

@pytest.mark.skipif("TRAVIS" in os.environ and os.environ["TRAVIS"] == "true",
                reason="Skipping this test on Travis CI.")

@Leguark , what was the point of doing this? Maybe including PyVista in Travis's check test before the deploy check would e.g. have averted https://github.com/cgre-aachen/gempy/issues/491 .

agzimmerman avatar Sep 23 '20 10:09 agzimmerman

Ah, missed the gdal in the massive conda command 😄 thanks. So we should see if there's any way to install gdal without conda, otherwise we'd again require a conda install. Or, we just leave out that one test, as it seems fairly specialized and gdal is not listed in any of the (optional) requirements anyways (or is it a depenedency of something that's being installed?)

I think we'll still want the deployment check.

Yeah I left out all the deployment stuff for now to just first test the basic install and testing

alex-schaaf avatar Sep 23 '20 11:09 alex-schaaf

Gdal is not used very much in gempy, I think it is only used for the topography command to convert the geotiff into series of coordinates for computing the geologic map. I think I replaced that code with rasterio and numpy meshgrid for my project.

I also worked out a way to use pyvista instead of pyevtk to export the regular grid voxels last night.

A series of prs to fix these small use cases of dependencies will simply things greatly, I’ll see if I can make one later today

-Andrew Annex

On Sep 23, 2020, at 7:08 AM, Alexander Schaaf [email protected] wrote:

 Ah, missed the gdal in the massive conda command 😄 thanks. So we should see if there's any way to install gdal without conda, otherwise we'd again require a conda install. Or, we just leave out that one test, as it seems fairly specialized and gdal is not listed in any of the (optional) requirements anyways (or is it a depenedency of something that's being installed?)

I think we'll still want the deployment check. Yeah I left out all the deployment stuff for now to just first test the basic install and testing

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

AndrewAnnex avatar Sep 23 '20 11:09 AndrewAnnex

actually I forgot @alex-schaaf that I already have a wip pr for removing gdal #525, I think any remaining issues with it would be trivial to fix

AndrewAnnex avatar Sep 23 '20 12:09 AndrewAnnex

@alex-schaaf @alexanderzimmerman pr #527 removes pyevtk and seems to pass all tests except for docs but that seems to be an xarray issue

AndrewAnnex avatar Sep 23 '20 14:09 AndrewAnnex

seems to pass all tests except for docs but that seems to be an xarray issue

Apparently the same was caught by a Travis CRON job three days ago. I made #530 for this now.

Travis checks deployment to GitHub Pages. Part of deployment is running GemPy's examples and putting their I/O into the docs using Sphinx. Some of those examples import packages that aren't used anywhere else in GemPy. Many of the problems that have led to raising the current issue have been related to these examples.

In the spirit of completing the current task before removing unnecessary dependencies: @alex-schaaf , is keeping gdal around making this task much more difficult?

agzimmerman avatar Sep 25 '20 07:09 agzimmerman

Well I have no idea how to install gdal automatically without conda - it certainly always crashes using pip on my machine, and manually downloading and installing unofficial wheels (https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal) doesn't sound very appealing 😄

alex-schaaf avatar Sep 28 '20 08:09 alex-schaaf

actually I forgot @alex-schaaf that I already have a wip pr for removing gdal #525, I think any remaining issues with it would be trivial to fix

awesome @AndrewAnnex ! we could pull that PR into this branch - would make this branch a bit more dirty but could help speed things up 🤔

alex-schaaf avatar Sep 28 '20 08:09 alex-schaaf

@alex-schaaf I think thats fine, I figured out my PR is failing in one test for the topography because the included DEM tif file in the repository doesn't have a CRS which makes it hard to use rasterio effectively to clip the topography. If someone has that CRS information laying around or can make a new version of that DEM file then I should be able to finish that PR. It is easier to first make the code changes that allow us to remove the troublesome dependencies (gdal, pyevtk, etc) first, then simplify the version management. Otherwise you are building the walls before the foundation to butcher an idiom.

AndrewAnnex avatar Sep 30 '20 21:09 AndrewAnnex

@alex-schaaf It looks like dropping direct usage of gdal is blocked effectively only because the test geotiff doesn't have CRS data, I wonder if we could get that fixed now if gem-gis is also working with rasterio

AndrewAnnex avatar Jan 12 '21 16:01 AndrewAnnex