gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Gitpod for GDAL

Open maphew opened this issue 3 years ago • 10 comments

What does this PR do?

Provides an easily reproducible and complete GDAL linux machine running on Gitpod cloud infrastructure.

Generally what these files do is:

  • Load a published OSGeo GDAL docker image
  • install or update dependencies with apt-get
  • Install sudo and add the gitpod user to it's list
  • install python development libraries and modules

Anyone can then run https://gitpod.io/#https://github.com/maphew/gdal/tree/gitpod-for-gdal/ and use the resultant machine.

What are related issues/pull requests?

  • https://github.com/OSGeo/gdal/pull/5296

Tasklist

  • [x] Working Gitpod config with docker hub osgeo/gdal (a.k.a. ubuntu-large-latest) - /docker/gitpod/osgeo-geo.Dockerfile
  • [x] Add test case(s) - see below
  • [ ] Review
  • [ ] Adjust for comments
  • [ ] All CI builds and checks have passed
  • [x] Add documentation - /docker/gitpod/readme.md - no longer relevant
  • [x] Update urls in readme to point to osgeo/gdal repository - no longer relevant

Environment

Gitpod

Test Case

If this sequence of commands completes successfully in the launched gitpod instance the instance is said to be complete for the purposes of this PR.

# Check out the source code tree to match the version of gdal binaries currently installed. 
sudo apt update
sudo apt-get install -y git

git checkout `gdalinfo --version | sed -s "s/GDAL.*-\(.*\), .*/\1/"`
    # Equiv to: 
    #     $ gdalinfo --version
    #     GDAL 3.4.0dev-d2f9067ffb15e593e9b826ca939dbd183636c780, released 2021/10/26
    #     $ git checkout d2f9067ffb15e593e9b826ca939dbd183636c780

cd autotest
pip install -r ./requirements.txt
pytest

Implemented in ./autotest/autotest_with_current_binary.sh

Related

Branch maphew:gitpod-for-gdal-small which uses docker hub ubuntu-small-latest for it's base.

Future work

This implementation uses gdal docker images that are published to Docker Hub, so are are always somewhat older than current development. It would be good to be able to adapt the Gdal config files in ./docker so that gitpod can be launched using current master. This is out of scope for what I'm willing to attempt so not part of this PR but could be a reasonable next step for someone else.

A possible step in that direction is /docker/gitpod/gitpod-workspace-full.Dockerfile which currently breaks during the build steps (see comment below). This effort is in branch maphew:gitpod-for-gdal-gp-full. It's here for reference in case someone else wants to pick it up. I don't plan to keep working on it.

maphew avatar Feb 27 '22 02:02 maphew

Test result for ubuntu-small-latest is attached. The configuration is a limited success: a machine successfully spins up and gdal utilities are available in shell. However if fails the original objective of being able to build and install a python wheel (@rouault #5296.)

Since the same test works in ubuntu-full I presume the failure is only because one or dependencies needs to be installed. I don't consider this reason enough to block the PR. ubuntu-small-latest-test-result.txt

maphew avatar Feb 27 '22 03:02 maphew

Successful test result for osgeo-gdal (ubuntu-large-latest) is attached.

osgeo-gdal-test-result.txt

maphew avatar Feb 27 '22 03:02 maphew

Great! I often use gitpod and use it to compile libraries like GDAL.

Gitpod's gitpod/workspace-full docker image has many awesome tools to assist with development. For example, homebrew is an essential tool for development, plus full-featured support for other languages such as js and python. Wouldn't it take away some of the fun to completely abandon its mirror?

Although configuring a GDAL development environment is a complex matter, my usual approach is as follows:

sudo bash .github/workflows/ubuntu_20.04/build-deps.sh  # install deps
bash .github/workflows/ubuntu_20.04/build.sh. # build

Fortunately, github action's ubuntu 20.04 install dependency scripts and build scripts work perfectly for this except #5395. And in vscode, we can use cmake for more language analyse.

Perhaps the image osgeo/gdal:ubuntu-full-latest is not suitable for development. So I recommend using gitpod/workspace-full docker image to preserve the full development experience

zy6p avatar Mar 01 '22 02:03 zy6p

Thanks for the feedback @zy6p! I foresee a future state where there are as many gitpod configs as there are uses for gdal, and I imagine gitpod/workspace-full would rank high among those.

My primary goal at the moment is to get the smallest build (within reason) that is capable of developing and testing the gdal-utils suite. As fast as gitpod is, launch time is still dependent on image size, and smaller is faster is better. For me.

maphew avatar Mar 01 '22 23:03 maphew

Perhaps using prebuilds would be a better solution to reduce the waiting time. But as far as image size is concerned, the smallest image should be the one dried by the production environment, not the development environment.

As you said, gitpod/workspace:full is the most popular solution, so you can maintain a minimal set of releases yourself and leave the choice to everyone.

zy6p avatar Mar 02 '22 02:03 zy6p

@zy6p I set up a config for workspace-full following your comment. It gets a long way there but fails when running the build scripts due to some different expectations about paths (see log excerpt below). I don't want to get into touching the build scripts so this is as far as I'll take it now. Commit 7fdb787 for anyone who wants to play with it.

It takes quite a long time. I read up on prebuilds but from the docs it seems like this orthogonal to using a custom dockerfile, meaning you can use one path or the other but not both.(?)

...snip...
#7 458.0 liblcms2-dev is already the newest version (2.9-4).
#7 458.0 liblcms2-dev set to manually installed.
#7 458.0 0 upgraded, 0 newly installed, 0 to remove and 9 not upgraded.
#7 DONE 458.2s

#8 [5/5] RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh
#8 0.851 Set cache size limit to 200.0 MB
#8 0.853 cache directory                     /home/gitpod/.ccache
#8 0.853 primary config                      /home/gitpod/.ccache/ccache.conf
#8 0.853 secondary config      (readonly)    /etc/ccache.conf
#8 0.853 cache hit (direct)                     0
#8 0.853 cache hit (preprocessed)               0
#8 0.853 cache miss                             0
#8 0.853 cache hit rate                      0.00 %
#8 0.853 cleanups performed                     0
#8 0.853 files in cache                         0
#8 0.853 cache size                           0.0 kB
#8 0.853 max cache size                     200.0 MB
#8 0.854 gdal/.github/workflows/ubuntu_20.04/build.sh: line 26: cd: /build: No such file or directory
#8 ERROR: process "/bin/sh -c bash gdal/.github/workflows/ubuntu_20.04/build.sh" did not complete successfully: exit code: 1
------
 > [5/5] RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh:
#8 0.853 secondary config      (readonly)    /etc/ccache.conf
#8 0.853 cache hit (direct)                     0
#8 0.853 cache hit (preprocessed)               0
#8 0.853 cache miss                             0
#8 0.853 cache hit rate                      0.00 %
#8 0.853 cleanups performed                     0
#8 0.853 files in cache                         0
#8 0.853 cache size                           0.0 kB
#8 0.853 max cache size                     200.0 MB
#8 0.854 gdal/.github/workflows/ubuntu_20.04/build.sh: line 26: cd: /build: No such file or directory
------
gitpod-workspace-full.Dockerfile:14
--------------------
  12 |     RUN sudo bash gdal/.github/workflows/ubuntu_20.04/build-deps.sh
  13 |     # build gdal
  14 | >>> RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh
  15 |     
  16 |     
--------------------
error: failed to solve: process "/bin/sh -c bash gdal/.github/workflows/ubuntu_20.04/build.sh" did not complete successfully: exit code: 1
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","command":"build","error":"exit status 1","level":"error","message":"build failed","serviceContext":{"service":"bob","version":""},"severity":"ERROR","time":"2022-03-04T22:09:32Z"}
exit

Error: headless task failed: exit status 1

Error: headless task failed: exit status 1

maphew avatar Mar 04 '22 22:03 maphew

I've changed the opening post and narrowed the goal of this PR to simply getting a viable Gitpod with fully functional gdal utilities and libraries machine working. Even @rouault I believe this is accomplished and is ready for review.

New test case (autotest) results:

Open https://gitpod.io/#https://github.com/maphew/gdal/tree/gitpod-for-gdal/.

From that machine run the following in bash:

#check out the source code tree to match the version of gdal binaries currently installed. 
$ sudo apt-get install -y git
$ gdalinfo --version
GDAL 3.4.0dev-d2f9067ffb15e593e9b826ca939dbd183636c780, released 2021/10/26

$ git checkout d2f9067ffb15e593e9b826ca939dbd183636c780
$ cd autotest
$ pip install -r ./requirements.txt
$ pytest

Tail end of the tests:

Results (644.44s):
    8486 passed
       4 failed
         - ogr/ogr_ngw.py:500 test_ogr_ngw_12
         - ogr/ogr_ngw.py:533 test_ogr_ngw_13
         - gdrivers/netcdf.py:4845 test_netcdf_open_userfaultfd
         - gdrivers/netcdf_multidim.py:1854 test_netcdf_multidim_open_userfaultfd
       2 xfailed
    1006 skipped
100 - done.

As far as I know this PR doesn't touch any of those files even indirectly so I think the failures indicated are probably more a result of the gdal binary libraries installed the gdal source code checkout not being in sync. One or the other is a step or three back or forward from each other. I'm happy to re-run and capture the full pytest output if needed.

Likewise the failing CI doesn't seem to be related to the content of this PR (?)

maphew avatar Mar 05 '22 21:03 maphew

@zy6p I set up a config for workspace-full following your comment. It gets a long way there but fails when running the build scripts due to some different expectations about paths (see log excerpt below). I don't want to get into touching the build scripts so this is as far as I'll take it now. Commit 7fdb787 for anyone who wants to play with it.

It takes quite a long time. I read up on prebuilds but from the docs it seems like this orthogonal to using a custom dockerfile, meaning you can use one path or the other but not both.(?)

...snip...
#7 458.0 liblcms2-dev is already the newest version (2.9-4).
#7 458.0 liblcms2-dev set to manually installed.
#7 458.0 0 upgraded, 0 newly installed, 0 to remove and 9 not upgraded.
#7 DONE 458.2s

#8 [5/5] RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh
#8 0.851 Set cache size limit to 200.0 MB
#8 0.853 cache directory                     /home/gitpod/.ccache
#8 0.853 primary config                      /home/gitpod/.ccache/ccache.conf
#8 0.853 secondary config      (readonly)    /etc/ccache.conf
#8 0.853 cache hit (direct)                     0
#8 0.853 cache hit (preprocessed)               0
#8 0.853 cache miss                             0
#8 0.853 cache hit rate                      0.00 %
#8 0.853 cleanups performed                     0
#8 0.853 files in cache                         0
#8 0.853 cache size                           0.0 kB
#8 0.853 max cache size                     200.0 MB
#8 0.854 gdal/.github/workflows/ubuntu_20.04/build.sh: line 26: cd: /build: No such file or directory
#8 ERROR: process "/bin/sh -c bash gdal/.github/workflows/ubuntu_20.04/build.sh" did not complete successfully: exit code: 1
------
 > [5/5] RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh:
#8 0.853 secondary config      (readonly)    /etc/ccache.conf
#8 0.853 cache hit (direct)                     0
#8 0.853 cache hit (preprocessed)               0
#8 0.853 cache miss                             0
#8 0.853 cache hit rate                      0.00 %
#8 0.853 cleanups performed                     0
#8 0.853 files in cache                         0
#8 0.853 cache size                           0.0 kB
#8 0.853 max cache size                     200.0 MB
#8 0.854 gdal/.github/workflows/ubuntu_20.04/build.sh: line 26: cd: /build: No such file or directory
------
gitpod-workspace-full.Dockerfile:14
--------------------
  12 |     RUN sudo bash gdal/.github/workflows/ubuntu_20.04/build-deps.sh
  13 |     # build gdal
  14 | >>> RUN bash gdal/.github/workflows/ubuntu_20.04/build.sh
  15 |     
  16 |     
--------------------
error: failed to solve: process "/bin/sh -c bash gdal/.github/workflows/ubuntu_20.04/build.sh" did not complete successfully: exit code: 1
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","command":"build","error":"exit status 1","level":"error","message":"build failed","serviceContext":{"service":"bob","version":""},"severity":"ERROR","time":"2022-03-04T22:09:32Z"}
exit

Error: headless task failed: exit status 1

Error: headless task failed: exit status 1

Nice try. The gitpod/workspace-full is its original image, so you don't have to set the image in the config entry .gitpod.yml. We just need to set up the run script as follows:

tasks:
  - name: Dependencies
    init: |
      sudo bash .github/workflows/ubuntu_20.04/build-deps.sh  # install deps
      sed "s/cd \/build/cd \/workspace\/gdal/1" -i .github/workflows/ubuntu_20.04/build.sh
      bash .github/workflows/ubuntu_20.04/build.sh. # build

The target build folder is not the same causing the error to be reported.

It seems inappropriate to package the source code when building the image, as the image is always updated slower than the source code. And gitpod has done something with its own mirrors, so when you open the same workspace multiple times, the files in the workspace/${ProjectName} directory will be preserved, while the rest of the directory will revert to the default. So there is no need to set up mirror and no need to clone the code manually.

There may still get some errors running build.sh, but this file is not built for gitpod, so it's understandable. But actually running build.sh is not particularly necessary, and we can run cmake successfully in ide.

zy6p avatar Mar 07 '22 02:03 zy6p

The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 21 days and is being automatically marked as "stale". If you think this pull request should be merged, please check - that all unit tests are passing - that all comments by reviewers have been addressed - that there is enough information for reviewers, in particular

  • link to any issues which this pull request fixes
  • add a description of workflows which this pull request fixes
  • add screenshots if applicable
  • that you have written unit tests where possible In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request. If there is no further activity on this pull request, it will be closed in a week.

stale[bot] avatar Apr 16 '22 13:04 stale[bot]

Status update: I've refreshed the opening post to reflect the updated goal and current state.

The test case has been added as a bash script to ./autotests. I don't know if that's a good place for it or not since it is not itself a pytest autotest. Last run results have 3 failed, which to my eye do not look related to this work.

================= short test summary info ===================== FAILED ogr/ogr_virtualogr.py::test_ogr_virtualogr_3 - AssertionError: assert '3.5.0dev-c7e3e652c32773ab5627aafa8c3bd76e01d657b6' in '\nERROR ret code = 1' FAILED gdrivers/netcdf.py::test_netcdf_open_userfaultfd - assert None FAILED gdrivers/netcdf_multidim.py::test_netcdf_multidim_open_userfaultfd - assert None

Results (343.64s): 8640 passed 3 failed - ogr/ogr_virtualogr.py:207 test_ogr_virtualogr_3 - gdrivers/netcdf.py:4875 test_netcdf_open_userfaultfd - gdrivers/netcdf_multidim.py:1861 test_netcdf_multidim_open_userfaultfd 2 xfailed 1044 skipped 100 - done.

I'm stumbling a bit on where to introduce this feature in the docs:


Gitpod workspace

Spin up a cloud-based developer environment linux machine with gdal installed on Gitpod infrastructure: https://www.gitpod.io/#https://github.com/OSGeo/gdal


I've added it to /README.md as I don't see a better place; I don't know if it warrants such prominent billing.

maphew avatar Apr 20 '22 19:04 maphew

@maphew I'm not sure what to do with this PR. I have no personal enthusiasm for it, but the project doesn't need to be held back and hostage of my chronicle lack of enthusiasm. If you're willing to pursue on this, I'd suggest you perhaps write a RFC and submit it for approval to the project steering committee.

rouault avatar Jul 04 '23 22:07 rouault

Thanks for the nudge @rouault. The circumstances which made me hot for this last year have ebbed, but I do think it's still fundamentally a good idea to have a web gdal-utils capable shell for testing. I'll review my current commitments and see if I have the wherewithal to continue with an RFC and so on. In the meantime I'll close this PR and reopen later if that comes to pass.

maphew avatar Jul 17 '23 23:07 maphew