retriever
retriever copied to clipboard
Add Soil data macrosys
Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution
source https://zenodo.org/record/2784001#.YDlJ02pKiBR or https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01
citation: "Tomislav Hengl, & Surya Gupta. (2019). Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution (Version v0.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2784001"
License (for files): Creative Commons Attribution Share Alike 4.0 International
@MarconiS https://gitlab.com/openlandmap/global-layers
Here is the link to the Zenodo archive for all derived datasets of global soil properties (0.065km2 spatial resolution)
- [ ] Soil bulk density: https://zenodo.org/record/2525665
- [ ] Soil pH in H2O: https://zenodo.org/record/2525664
- [ ] Soil texture classes (USDA system): https://zenodo.org/record/2525817
- [ ] Coarse fragments %: https://zenodo.org/record/2525682
- [ ] Silt content in % (kg / kg): https://zenodo.org/record/2525676
- [ ] Clay content in % (kg / kg): https://zenodo.org/record/2525663
- [ ] Sand content in % (kg / kg): https://zenodo.org/record/2525662
- [ ] Predicted USDA soil suborders : https://zenodo.org/record/2657408
- [ ] Predicted USDA soil orders: https://zenodo.org/record/2658183
- [ ] Soil organic carbon stock in kg/m2: https://zenodo.org/record/2536040
- [ ] Predicted USDA soil orders : https://zenodo.org/record/2658183
- [ ] Soil organic carbon content in x 5 g / kg at 6 standard depths: https://zenodo.org/record/2525553
- [ ] Soil water content (volumetric %): https://zenodo.org/record/2784001
- [ ] Soil available water capacity in mm derived for 5 standard layers : https://zenodo.org/record/2629149
Are these datasets added to scripts
in retriever-recipes
? If not then I would like to solve this issue.
@Aakash3101 feel free to work on the issue. Recommend that you start from down to up
Sure @henrykironde
@henrykironde I wanted to clear a doubt, In the last dataset "Soil available water capacity in mm derived for 5 standard layers", I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate
I can have all the files in the same directory ?
Also shall I make separate commits for each dataset or a combined commit?
I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate I can have all the files in the same directory ?
Yes all the files in the same directory. In this case, I think a fitting name for the directory would be Soil_available_water_capacity
@henrykironde I think this PR can be completed during my GSOC project, if I get selected, Because these files are very big indeed 😂, and I might take time to check each one, and then make a PR for the dataset added.
Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.
Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.
Yes, actually I am enjoying doing this kind of work as I am learning new things.
@henrykironde I am not able to load the .tif
files into postgresql. There is some kind of limitation of size for raster2pgsql
to work efficiently. raster2pgsql
works completely fine with small files, but it is just stuck when I run it for the big files which are around 3 to 4 GB.
I will check this out
I will check this out
Well I am also figuring out something, and it turns out that the tile size can impact the processing time. In the code for the install
command the tile size is 100x100
, and when I tried for tile size 2000x2000
, the file was saved in the database, but I cannot view it in QGIS. Both pgadmin4 and DB manager in QGIS show that the table does have raster values.
I will check this out
Any updates @henrykironde? To me, it seems that when a tile size of 100x100
is used, a lot of rows will be generated with the tile size.
For example, the size of this file is 172800x71698
aakash01@aakash01-G3-3579:~/.retriever/raw_data/soil-available-water-capacity $ gdalinfo sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif
Driver: GTiff/GeoTIFF
Files: sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif
Size is 172800, 71698
Coordinate System is:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.000000000000000,87.370000000000005)
Pixel Size = (0.002083333000000,-0.002083333000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=DEFLATE
INTERLEAVE=BAND
Corner Coordinates:
Upper Left (-180.0000000, 87.3700000) (180d 0' 0.00"W, 87d22'12.00"N)
Lower Left (-180.0000000, -62.0008094) (180d 0' 0.00"W, 62d 0' 2.91"S)
Upper Right ( 179.9999424, 87.3700000) (179d59'59.79"E, 87d22'12.00"N)
Lower Right ( 179.9999424, -62.0008094) (179d59'59.79"E, 62d 0' 2.91"S)
Center ( -0.0000288, 12.6845953) ( 0d 0' 0.10"W, 12d41' 4.54"N)
Band 1 Block=172800x1 Type=Int16, ColorInterp=Gray
NoData Value=-32768
Overviews: 86400x35849, 43200x17925, 21600x8963, 10800x4482, 5400x2241, 2700x1121, 1350x561
When I run the raster2pgsql
command for a tile size of 100x100
, it takes an indefinite time to process, while for tile sizes 2000x2000
or 5000x5000
it takes about 40 mins - 1hour. But the problem is when I try to view the raster through QGIS it seems to add the layer to the canvas and then it crashes after 10 mins or so.
One another way to deal with this processing time issue is that if we reference the file to the database using the -R
flag of raster2pgsql
command, using this flag only the reference will be stored and not the raster data into the database.
But this will impact the reason why we are storing it in the database in the first place because if the file is moved from the destination it should be in, the reference would not work. I had the idea for the -R
flag because since the raw data is downloaded when you first install the dataset, and it does not get deleted, so if we reference the data, it would save the user some storage on the system.
@Aakash3101 what are your computational resources?
@Aakash3101 what are your computational resources?
CPU : i7 8th Gen GPU: GeForce GTX 1050 Ti Ram: 8GB DDR4 GPU Ram: 4GB OS: Ubuntu 20.04 LTS
Could you try to close other applications(especially IDEs), open QGIS and try to load the map. I will try it later from my end. Give it a few minutes to render.
I can load and view the map from the raw data file, but not from the PostGIS database.
Yes load the data from PostGIS database and give it at least 10 minutes based on your resources. Make sure we free at least 4 gb of memory. Most Ides will take about 2gb. Closing them will enable QGIS load the data
Okay, I will let you know if it opens.
So this time while loading the file in QGIS, I monitored my RAM usage through the terminal and it uses all my memory. And then the application is terminated. I don't know the reasons, but I will soon find out.
And when I open the raw data file, it uses just around 2GB of my RAM. I think that the memory usage is caused by PostGIS in the background by running queries or something.
When I query the table in pgadmin4 to show all the values in the table, postgres uses all the RAM, and then it freezes, so I think I need to optimize the memory available for queries. Please let me know if you find something useful to optimize the memory usage.
Okey I think at this point you should let me handle this. It could take at least one day or two. I will try to find a way around. This is at a good point/phase. I will update you. I need to finish up with some other spatial datasets first
Sure @henrykironde