retriever icon indicating copy to clipboard operation
retriever copied to clipboard

Add Soil data macrosys

Open henrykironde opened this issue 4 years ago • 26 comments

Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution

source https://zenodo.org/record/2784001#.YDlJ02pKiBR or https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01

citation: "Tomislav Hengl, & Surya Gupta. (2019). Soil water content (volumetric %) for 33kPa and 1500kPa suctions predicted at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution (Version v0.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2784001"

License (for files): Creative Commons Attribution Share Alike 4.0 International

henrykironde avatar Feb 26 '21 19:02 henrykironde

@MarconiS https://gitlab.com/openlandmap/global-layers

henrykironde avatar Mar 01 '21 17:03 henrykironde

Here is the link to the Zenodo archive for all derived datasets of global soil properties (0.065km2 spatial resolution)

  • [ ] Soil bulk density: https://zenodo.org/record/2525665
  • [ ] Soil pH in H2O: https://zenodo.org/record/2525664
  • [ ] Soil texture classes (USDA system): https://zenodo.org/record/2525817
  • [ ] Coarse fragments %: https://zenodo.org/record/2525682
  • [ ] Silt content in % (kg / kg): https://zenodo.org/record/2525676
  • [ ] Clay content in % (kg / kg): https://zenodo.org/record/2525663
  • [ ] Sand content in % (kg / kg): https://zenodo.org/record/2525662
  • [ ] Predicted USDA soil suborders : https://zenodo.org/record/2657408
  • [ ] Predicted USDA soil orders: https://zenodo.org/record/2658183
  • [ ] Soil organic carbon stock in kg/m2: https://zenodo.org/record/2536040
  • [ ] Predicted USDA soil orders : https://zenodo.org/record/2658183
  • [ ] Soil organic carbon content in x 5 g / kg at 6 standard depths: https://zenodo.org/record/2525553
  • [ ] Soil water content (volumetric %): https://zenodo.org/record/2784001
  • [ ] Soil available water capacity in mm derived for 5 standard layers : https://zenodo.org/record/2629149

MarconiS avatar Mar 01 '21 17:03 MarconiS

Are these datasets added to scripts in retriever-recipes ? If not then I would like to solve this issue.

Aakash3101 avatar Mar 24 '21 14:03 Aakash3101

@Aakash3101 feel free to work on the issue. Recommend that you start from down to up

henrykironde avatar Mar 24 '21 16:03 henrykironde

Sure @henrykironde

Aakash3101 avatar Mar 24 '21 16:03 Aakash3101

@henrykironde I wanted to clear a doubt, In the last dataset "Soil available water capacity in mm derived for 5 standard layers", I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate I can have all the files in the same directory ?

Aakash3101 avatar Mar 24 '21 16:03 Aakash3101

Also shall I make separate commits for each dataset or a combined commit?

Aakash3101 avatar Mar 24 '21 17:03 Aakash3101

I can make a single script for all the files in the dataset, right? The dataset has 7 files so when I run retriever autocreate I can have all the files in the same directory ?

Yes all the files in the same directory. In this case, I think a fitting name for the directory would be Soil_available_water_capacity

henrykironde avatar Mar 24 '21 17:03 henrykironde

@henrykironde I think this PR can be completed during my GSOC project, if I get selected, Because these files are very big indeed 😂, and I might take time to check each one, and then make a PR for the dataset added.

Aakash3101 avatar Mar 24 '21 19:03 Aakash3101

Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.

henrykironde avatar Mar 24 '21 20:03 henrykironde

Each checkbox is a single PR, I am actually working on them so do worry about the whole issue. Your goal should be to understand or get a good overview of the moving parts in the project.

Yes, actually I am enjoying doing this kind of work as I am learning new things.

Aakash3101 avatar Mar 25 '21 06:03 Aakash3101

@henrykironde I am not able to load the .tif files into postgresql. There is some kind of limitation of size for raster2pgsql to work efficiently. raster2pgsql works completely fine with small files, but it is just stuck when I run it for the big files which are around 3 to 4 GB.

Aakash3101 avatar Mar 27 '21 12:03 Aakash3101

I will check this out

henrykironde avatar Mar 27 '21 18:03 henrykironde

I will check this out

Well I am also figuring out something, and it turns out that the tile size can impact the processing time. In the code for the install command the tile size is 100x100, and when I tried for tile size 2000x2000, the file was saved in the database, but I cannot view it in QGIS. Both pgadmin4 and DB manager in QGIS show that the table does have raster values.

Aakash3101 avatar Mar 27 '21 18:03 Aakash3101

I will check this out

Any updates @henrykironde? To me, it seems that when a tile size of 100x100 is used, a lot of rows will be generated with the tile size. For example, the size of this file is 172800x71698

aakash01@aakash01-G3-3579:~/.retriever/raw_data/soil-available-water-capacity $ gdalinfo sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif 
Driver: GTiff/GeoTIFF
Files: sol_available.water.capacity_usda.mm_m_250m_30..60cm_1950..2017_v0.1.tif
Size is 172800, 71698
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.000000000000000,87.370000000000005)
Pixel Size = (0.002083333000000,-0.002083333000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-180.0000000,  87.3700000) (180d 0' 0.00"W, 87d22'12.00"N)
Lower Left  (-180.0000000, -62.0008094) (180d 0' 0.00"W, 62d 0' 2.91"S)
Upper Right ( 179.9999424,  87.3700000) (179d59'59.79"E, 87d22'12.00"N)
Lower Right ( 179.9999424, -62.0008094) (179d59'59.79"E, 62d 0' 2.91"S)
Center      (  -0.0000288,  12.6845953) (  0d 0' 0.10"W, 12d41' 4.54"N)
Band 1 Block=172800x1 Type=Int16, ColorInterp=Gray
  NoData Value=-32768
  Overviews: 86400x35849, 43200x17925, 21600x8963, 10800x4482, 5400x2241, 2700x1121, 1350x561

When I run the raster2pgsql command for a tile size of 100x100, it takes an indefinite time to process, while for tile sizes 2000x2000 or 5000x5000 it takes about 40 mins - 1hour. But the problem is when I try to view the raster through QGIS it seems to add the layer to the canvas and then it crashes after 10 mins or so.

One another way to deal with this processing time issue is that if we reference the file to the database using the -R flag of raster2pgsql command, using this flag only the reference will be stored and not the raster data into the database.

But this will impact the reason why we are storing it in the database in the first place because if the file is moved from the destination it should be in, the reference would not work. I had the idea for the -R flag because since the raw data is downloaded when you first install the dataset, and it does not get deleted, so if we reference the data, it would save the user some storage on the system.

Aakash3101 avatar Mar 29 '21 16:03 Aakash3101

@Aakash3101 what are your computational resources?

henrykironde avatar Mar 29 '21 18:03 henrykironde

@Aakash3101 what are your computational resources?

CPU : i7 8th Gen GPU: GeForce GTX 1050 Ti Ram: 8GB DDR4 GPU Ram: 4GB OS: Ubuntu 20.04 LTS

Aakash3101 avatar Mar 29 '21 19:03 Aakash3101

Could you try to close other applications(especially IDEs), open QGIS and try to load the map. I will try it later from my end. Give it a few minutes to render.

henrykironde avatar Mar 29 '21 20:03 henrykironde

I can load and view the map from the raw data file, but not from the PostGIS database.

Aakash3101 avatar Mar 29 '21 20:03 Aakash3101

Yes load the data from PostGIS database and give it at least 10 minutes based on your resources. Make sure we free at least 4 gb of memory. Most Ides will take about 2gb. Closing them will enable QGIS load the data

henrykironde avatar Mar 29 '21 20:03 henrykironde

Okay, I will let you know if it opens.

Aakash3101 avatar Mar 29 '21 20:03 Aakash3101

So this time while loading the file in QGIS, I monitored my RAM usage through the terminal and it uses all my memory. And then the application is terminated. I don't know the reasons, but I will soon find out.

Aakash3101 avatar Mar 29 '21 20:03 Aakash3101

And when I open the raw data file, it uses just around 2GB of my RAM. I think that the memory usage is caused by PostGIS in the background by running queries or something.

Aakash3101 avatar Mar 29 '21 20:03 Aakash3101

When I query the table in pgadmin4 to show all the values in the table, postgres uses all the RAM, and then it freezes, so I think I need to optimize the memory available for queries. Please let me know if you find something useful to optimize the memory usage.

Aakash3101 avatar Mar 30 '21 07:03 Aakash3101

Okey I think at this point you should let me handle this. It could take at least one day or two. I will try to find a way around. This is at a good point/phase. I will update you. I need to finish up with some other spatial datasets first

henrykironde avatar Mar 30 '21 07:03 henrykironde

Sure @henrykironde

Aakash3101 avatar Mar 30 '21 08:03 Aakash3101