Open lidar-based DSM and DTM dataset for ~72% of UK
I'm currently using Valhalla as a fallback for working out the ground height below buildings in Polygon City (another Mapzen project) and having more accurate height data in urban environments would be very, very useful for us (as it would with my own project, ViziCities) – SRTM just doesn't cut it for the accuracy we need, both horizontally and vertically.
Let me know if this isn't helpful here – I saw that you wanted suggestions for open elevation datasets to use and assumed this was the best place. I have plenty more to suggest if it's helpful.
The Environment Agency in the UK recently released the first tranche of their LIDAR elevation data for around 72% of the UK – including DSM and DTM ASCII files at horizontal resolutions of 25cm, 50cm, 1m, and 2m. This data has been released under an Open Government License, which is compatible with the CC-BY 4.0 license.
There's a lot of data and currently the only way to download it is manually through a portal. The data is also more verbose than it needs to be due to unnecessary decimal accuracy and the EA are looking to reduce this overhead (and the file-size) in the near future.
There are 2 potential approaches to automate the download of the data:
- The download links are easy to automate (based on the Ordnance Survey National Grid) and so it would be possible to batch them all up and slowly work through them
- I have contacts at the Environment Agency who may be able to help get access a bulk download or some other bulk export
From @kevinkreiser:
@robhawkes yes as many suggestions as you can muster would be great. we don't currently have a schedule for adding new datasets, but once we have a decent list going we can start to do that. as you point out, with each new data set we'll have to jump through the hoops to get it into the proper format but that's part of the fun ;)
one question i have for you though. what kind of horizontal resolution are you looking for? Currently we are doing 1 arc second a la srtm (lower res datasets are upscaled). in terms of disk space, much higher than that becomes prohibitive (when you are dealing with the whole world). we could start to do variable resolution tiles internally but the complications that brings (and possibly decrease in performance) is probably not worth it. so if we integrate this, would you be expecting 25cm resolution or could you make due with something more like 10 meters?
This would be better data than the 30m EU data in #132.
Getting the download URLs looks like this:
curl -s http://www.geostore.com/environment-agency/rest/product/EA_SUPPLIED_OS_10KM/SP16?catalogName=Survey \
| jq -r '.[] | select(.coverageLayer | contains("LIDAR-DSM-1M-ENGLAND-EA-MD-YY")) | "http://www.geostore.com/environment-agency/rest/product/download/\(.guid)\t\(.fileName)"'
This returns a TSV with the URL and the filename:
http://www.geostore.com/environment-agency/rest/product/download/c7e2fa0c-157d-11e7-a00a-8cdcd4b4861c LIDAR-DSM-1M-SP16ne.zip
http://www.geostore.com/environment-agency/rest/product/download/408e5664-157d-11e7-a00a-8cdcd4b4861c LIDAR-DSM-1M-SP16sw.zip
http://www.geostore.com/environment-agency/rest/product/download/4829effa-157d-11e7-a00a-8cdcd4b4861c LIDAR-DSM-1M-SP16nw.zip
http://www.geostore.com/environment-agency/rest/product/download/c3af14b6-157d-11e7-a00a-8cdcd4b4861c LIDAR-DSM-1M-SP16se.zip
I put together a script to scrape together all the URLs:
https://github.com/iandees/uk-lidar
The catalog.csv file has tens of thousands of lines in it, but we only need some of them. Either way, this source is broken into a whooole lot of files so it's going to be quite a bit of work to transcode.
Starting on 2m terrain model, looks like there are 5,679 images to grab:
curl -s https://raw.githubusercontent.com/iandees/uk-lidar/master/catalog.csv \
| grep LIDAR-DTM-2M-ENGLAND-EA-MD-YY \
| wc -l
Mirroring data
curl -s https://raw.githubusercontent.com/iandees/uk-lidar/master/catalog.csv \
| grep LIDAR-DTM-2M-ENGLAND-EA-MD-YY \
| cut -d, -f 8,5 \
| xargs -I {} -P 24 -n 1 \
sh -c 'export f="{}"; a=$(cut -d, -f 1 <<< $f); b=$(cut -d, -f 2 <<< $f); s3=s3://elevation-sources-prod/uk_lidar/$a; curl -s http://www.geostore.com/environment-agency/rest/product/download/$b | AWS_PROFILE=openterrain aws s3 cp - $s3 && echo $s3' \
| tee uk_lidar_s3_objects.txt
At this point I tried opening one of these files to verify that the transcode process will be able to handle it. It turns out each of the .zip files has ~a dozen or more .asc files. A couple hiccups as a result:
- The transcode system doesn't know how to handle
.ascor ASCII files. gdal and qgis know what to do, but you have to manually pair a CRS with the data, which the transcode system can't do easily right now. (We'd have to give each image a VRT with the projection information embedded in it) - The transcode system currently only knows how to handle one image at a time. When it encounters a .zip or .tar.gz, it looks for the first .tif or .dem contained there-in and processes that.
I'd appreciate @mojodna's thoughts on this, but I think it would make sense to come up with an external process that composites these tiny little slices of DEM back into a larger TIF that is more suitable for consumption by our existing transcode system. This recombobulated data would probably be useful to others trying to consume this data from the UK, too.
I learned a new “word” today, thanks Ian!
On Fri, Jul 7, 2017 at 8:42 AM, Ian Dees [email protected] wrote:
At this point I tried opening one of these files to verify that the transcode process will be able to handle it. It turns out each of the .zip files has ~a dozen or more .asc files. A couple hiccups as a result:
- The transcode system doesn't know how to handle .asc or ASCII files. gdal and qgis know what to do, but you have to manually pair a CRS with the data, which the transcode system can't do easily right now. (We'd have to give each image a VRT with the projection information embedded in it)
- The transcode system currently only knows how to handle one image at a time. When it encounters a .zip or .tar.gz, it looks for the first .tif or .dem contained there-in and processes that.
I'd appreciate @mojodna https://github.com/mojodna's thoughts on this, but I think it would make sense to come up with an external process that composites these tiny little slices of DEM back into a larger TIF that is more suitable for consumption by our existing transcode system. This recombobulated https://onmilwaukee.com/visitors/articles/recombobulationsigns.html data would probably be useful to others trying to consume this data from the UK, too.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tilezen/joerd/issues/25#issuecomment-313717855, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0EOxCpo0ThjHBWHbZ3smWbLClXzjqgks5sLlHWgaJpZM4HWrKV .
I think it would make sense to come up with an external process that composites these tiny little slices of DEM back into a larger TIF that is more suitable for consumption by our existing transcode system. This recombobulated data would probably be useful to others trying to consume this data from the UK, too.
Absolutely! The tiles alone will be incredibly useful but if there was some way of getting a combined set of elevation data for the UK that would be amazing.
Over the weekend I added a few scripts to my uk-lidar repo to assist with this. I processed the .zip's of .asc's into .tif's and started merging them into larger images, but it looks like I accidentally deleted the results of that work.
I'm re-running that process now and will get started transcoding when the merging step is done.
I completed the process to convert, merge, and upload the 2m LIDAR-based data from the UK and documented it here: https://github.com/iandees/uk-lidar/blob/master/dem_composite/README.md
Transcode files
aws s3 ls s3://elevation-sources-prod/uk_lidar/ --recursive | \
grep .tif$ | \
awk '{print $4}' | \
while read filename; do \
bn=$(basename ${filename%.*}); \
make submit-job job=aws/transcode-job.json.hbs input=s3://elevation-sources-prod/${filename} output=s3://elevation-sources-transcoded/$(dirname $filename)/${bn} name=${bn:0:50}
done
I finished transcoding and loading footprints to get this:

The data looked great, but there was a large chunk of data missing in the lower right. I re-downloaded and re-processed the dataset to pull in that missing data and got this after re-doing the transcode step:

One important thing to point out is that this dataset is not 100% coverage of the UK like the images above suggest. It appears that they collected this data on seemingly random flights in different directions over the course of time:
From the dataset preview map
This means there will be bits of SRTM poking through in places where they haven't collected data yet.
This looks amazing! The LIDAR data definitely isn't 100% coverage (more like 70%) and the coverage also depends on the year that the data was collected, at least from what I can tell. I imagine many of the flights are near some kind of river or body of water (for flood risk analysis), giving the seemingly random direction of flights.
Here's the Environment Agency description of the data:
The Environment Agency’s LIDAR data archive contains digital elevation data derived from surveys carried out by the Environment Agency's specialist remote sensing team. Accurate elevation data is available for over 70% of England. This dataset is derived from a combination of our full dataset which has been merged and re-sampled to give the best possible coverage. Data is available at 2m, 1m, 50cm, and 25cm resolution.
There's also LIDAR data for Wales and Scotland's is due any time now. Shall I create new issues for them?
There's also LIDAR data for Wales and Scotland's is due any time now. Shall I create new issues for them?
Yes! The more data the better.
Yes, Wales as new issue would be great!
On Jul 11, 2017, at 06:56, Robin Hawkes [email protected] wrote:
This looks amazing! The LIDAR data definitely isn't 100% coverage (more like 70%) and the coverage also depends on the year that the data was collected, at least from what I can tell. I imagine many of the flights are near some kind of river or body of water (for flood risk analysis), giving the seemingly random direction of flights.
Here's the Environment Agency description of the data:
The Environment Agency’s LIDAR data archive contains digital elevation data derived from surveys carried out by the Environment Agency's specialist remote sensing team. Accurate elevation data is available for over 70% of England. This dataset is derived from a combination of our full dataset which has been merged and re-sampled to give the best possible coverage. Data is available at 2m, 1m, 50cm, and 25cm resolution.
There's also LIDAR data for Wales. Shall I create a new issue for it?
― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
It appears that Defra have moved their data onto a new platform at https://environment.data.gov.uk/. Has anyone tried to bulk download from there? I tried but failed miserably!
I've been poking at it for a while and haven't found a nice way to do it.