MetPy Using MetPy to split up testing/training/validation xarray datasets for Machine Learning

Using MetPy to split up testing/training/validation xarray datasets for Machine Learning

Open ThomasMGeo opened this issue 7 months ago • 6 comments

What should we add?

Creating testing/training/validation datasets is a key step in machine learning workflows. Usually for Climate/Weather ML analysis, we split these datasets on a time dimension.

Scikit-learn has a function that does this for 2D arrays / pandas dataframes here. This function can't split xarray datasets.

Improvements on the scikit-learn implementation:

Built for xarray datasets
Can create a validation dataset (a third dataset) instead of doing it in two lines
Can split datasets up in a useful way for time series analysis (do not split up datasets randomly for time series analysis!)

Big questions:

Where should this go?
can we use Xr.dataset.parse_cf() in a smart way to pull the time dimension automagically? This might not be required anyways.

Reference

No response

Jul 22 '24 16:07 ThomasMGeo

MetPy MetPy copied to clipboard

Using MetPy to split up testing/training/validation xarray datasets for Machine Learning

What should we add?

Reference

MetPy
MetPy copied to clipboard