Ocean-Data-Map-Project icon indicating copy to clipboard operation
Ocean-Data-Map-Project copied to clipboard

Normalize datasetconfig.json

Open htmlboss opened this issue 6 years ago • 1 comments
trafficstars

There's lots of duplicate data in our dataset config file and normalizing it will greatly improve maintainability, increase flexibility, and reduce the physical size of the file. Specifically, the variables field is one of the culprits. For example

"giops_day": {
        "name": "GIOPS Daily",
        "url": "http://navigator.oceansdata.ca/thredds/dodsC/giops/daily/aggregated.ncml",
        "enabled": true,
        "quantum": "day",
        "climatology": "http://trinity:8080/thredds/dodsC/climatology/Levitus98_PHC21/aggregated.ncml",
        "attribution": "GIOPS Daily Values from CONCEPTS",
        "help": "Global Ice Ocean Prediction System
        <ul>
            <li>Global Coverage</li>
            <li>Tri-polar ORCA grid 1/4° resolution (ORCA025), &lt; 15km in Arctic</li>
            <li>50 vertical z-levels</li>
            <li>Available as monthly averages (May 2014&ndash;April 2015)</li>
            <li>Variables Available:
                <ul>
                    <li>Ice Concentration</li>
                    <li>Ice Volume</li>
                    <li>Meridional Wind</li>
                    <li>Salinity</li>
                    <li>Sea Surface Height (Free Surface)</li>
                    <li>Sea Water Velocity</li>
                    <li>Sea Water East Velocity</li>
                    <li>Sea Water North Velocity</li>
                    <li>Sea Water X Velocity</li>
                    <ul>
                      <li>water velocity along model x grid lines</li>
                    </ul>
                    <li>Sea Water Y Velocity</li>
                     <ul>
                      <li>water velocity along model y grid lines</li>
                    </ul>
                    <li>Water Temperature</li>
                    <li>Wind</li>
                    <li>Zonal Wind</li>
                </ul>
            </li>
        </ul>",
        
        "variables": {
            "vozocrtx": { "name": "Water X Velocity", "unit": "m/s", "scale": [-3, 3], "zero_centered": "true" },
            "vomecrty": { "name": "Water Y Velocity", "unit": "m/s", "scale": [-3, 3], "zero_centered": "true" },
            "vozocrte,vomecrtn": { "name": "Water Velocity", "unit": "m/s", "scale": [0, 3], "scale_factor": 1 },
            "votemper": { "name": "Temperature", "unit": "Celsius", "scale": [-5, 30], "equation": "votemper - 273.15", "dims": ["time_counter", "depth", "y", "x"] },
            "vosaline": { "name": "Salinity", "unit": "PSU", "scale": [30, 40] },
            "sossheig": { "name": "Sea Surface Height", "unit": "m", "scale": [-3, 3], "zero_centered": "true"  },
            "aice": { "name": "Ice Concentration", "unit": "fraction", "scale": [0, 1] },
            "vice": { "name": "Ice Volume",        "unit": "m", "scale": [0, 10] },
            "u_wind": { "name": "Zonal Wind",      "unit": "m/s", "scale": [-20, 20], "zero_centered": "true"  },
            "v_wind": { "name": "Meridional Wind", "unit": "m/s", "scale": [-20, 20], "zero_centered": "true"  },
            "wind":   { "name": "Wind",            "unit": "m/s", "scale": [0, 20] },
            "divergence": { "name": "Water Divergence", "unit": "1/10^6 s", "scale": [-50, 50], "scale_factor": 1e6, "equation": "divergence(vozocrtx, vomecrty, nav_lat, nav_lon)", "zero_centered": "true"},
            "vozocrte": { "name": "Water East Velocity", "scale": [-3, 3], "scale_factor": 1, "unit": "m/s", "equation": "vozocrtx * cos_alpha - vomecrty * sin_alpha", "zero_centered": "true" },
            "vomecrtn": { "name": "Water North Velocity", "scale": [-3, 3], "scale_factor": 1, "unit": "m/s", "equation": "vozocrtx * sin_alpha + vomecrty * cos_alpha", "zero_centered": "true" },
            "sspeed": { "name": "Speed of Sound", "scale": [1400, 1600], "scale_factor": 1, "unit": "m/s", "equation": "sspeed(depth, nav_lat, vosaline, votemper - 273.15)" },
            "vorticity": { "name": "Water Vorticity", "scale": [-50, 50], "scale_factor": 1e6, "unit": "1/10^6 s", "equation": "vorticity(vozocrtx, vomecrty, nav_lat, nav_lon)", "zero_centered": "true" }
        }
    },
    "giops_forecast": {
        "url": "http://navigator.oceansdata.ca/thredds/dodsC/giops/forecast/aggregated.ncml",
        "name": "GIOPS 10-day Forecast",
        "quantum": "day",
        "enabled": true,
        "cache": 6,
        "climatology": "http://navigator.oceansdata.ca/thredds/dodsC/climatology/Levitus98_PHC21/aggregated.ncml",
        "attribution": "GIOPS 10-day Forecast from CONCEPTS",
        "help": "Global Ice Ocean Prediction System
        <ul>
            <li>Global Coverage</li>
            <li>Tri-polar ORCA grid 1/4° resolution (ORCA025), &lt; 15km in Arctic</li>
            <li>50 vertical z-levels</li>
            <li>Available as monthly averages (May 2014&ndash;April 2015)</li>
            <li>Variables Available:
                <ul>
                    <li>Ice Concentration</li>
                    <li>Ice Volume</li>
                    <li>Meridional Wind</li>
                    <li>Salinity</li>
                    <li>Sea Surface Height (Free Surface)</li>
                    <li>Sea Water Velocity</li>
                    <li>Sea Water East Velocity</li>
                    <li>Sea Water North Velocity</li>
                    <li>Sea Water X Velocity</li>
                    <ul>
                      <li>water velocity along model x grid lines</li>
                    </ul>
                    <li>Sea Water Y Velocity</li>
                     <ul>
                      <li>water velocity along model y grid lines</li>
                    </ul>
                    <li>Water Temperature</li>
                    <li>Wind</li>
                    <li>Zonal Wind</li>
                </ul>
            </li>
        </ul>",
        
        "variables": {
            "vozocrtx": { "name": "Water East Velocity", "unit": "m/s", "scale": [-3, 3], "zero_centered": "true"  },
            "vomecrty": { "name": "Water North Velocity", "unit": "m/s", "scale": [-3, 3], "zero_centered": "true"  },
            "vozocrtx,vomecrty": { "name": "Water Velocity", "unit": "m/s", "scale": [0, 3] },
            "votemper": { "name": "Temperature", "unit": "Celsius", "scale": [-5, 30], "equation": "votemper - 273.15", "dims": ["time", "depth", "latitude", "longitude"] },
            "vosaline": { "name": "Salinity", "unit": "PSU", "scale": [30, 40] },
            "sossheig": { "name": "Sea Surface Height", "unit": "m", "scale": [-3, 3], "zero_centered": "true"  },
            "aice": { "name": "Ice Concentration", "unit": "fraction", "scale": [0, 1] },
            "vice": { "name": "Ice Volume",        "unit": "m", "scale": [0, 10] },
            "u_wind": { "name": "Zonal Wind",      "unit": "m/s", "scale": [-20, 20], "zero_centered": "true"  },
            "v_wind": { "name": "Meridional Wind", "unit": "m/s", "scale": [-20, 20], "zero_centered": "true"  },
            "wind": { "name": "Wind", "unit": "m/s", "scale": [0, 20] },
            "sokaraml": { "name": "Ocean Mixed Layer Depth", "unit": "m", "scale": [0, 4000] }
        }
    },

To make matters worse, we now have a calculation layer that conditionally adds new variables to various datasets and copy-pasting the equations to each dataset is not ideal. The following is a proposed schema:


"datasets" : {
    "giops_day": {
        "name": "GIOPS Daily",
        "variables": ["votemper", "vosaline", "..."],
        "..."
    },
   "riops_day": {
        "name": "RIOPS Daily",
        "variables": ["votemper", "vosaline", "..."],
        "..."
    },
   "..."
},

"variables": {
    "votemper": { "name": "Temperature", "unit": "Celsius", "scale": [-5, 30],  "equation": "votemper - 273.15", "dims": ["time", "depth", "latitude", "longitude"] },
    "..."
},

What is data normalization? From Microsoft: "Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database."

https://support.microsoft.com/en-ca/help/283878/description-of-the-database-normalization-basics

htmlboss avatar Jun 18 '19 12:06 htmlboss

Action items:

  • [ ] Normalize datasetconfig.json.
  • [ ] Update datasetconfig.py to account for the normalization.
  • [ ] Update tests/test_datasetconfig.py.

htmlboss avatar Jun 18 '19 12:06 htmlboss