hvplot icon indicating copy to clipboard operation
hvplot copied to clipboard

xvec support

Open ahuang11 opened this issue 1 year ago • 3 comments

Closes https://github.com/holoviz/geoviews/issues/737

Not entirely sure what the level of support should be implemented for xvec; should this be in geoviews or just hvplot?

image image

In attempt 2 (current), I convert the xarray dataset into geopandas dataframe, "flattening" extra geometries, i.e. converting them into integer indices which is done through drop_vars, and then gathering all the xarray dims as groupby; that way, it still shows up as slider widgets

https://github.com/user-attachments/assets/a52c3125-6519-4f96-86fc-d7f22cbb808e

However, the integers aren't meaningful so I was wondering if there are extra geometries, should I overlay centroid points of the other geometries? If so, how do I even do that in hvplot?

In attempt 1, I try to keep it in its xarray data structure, but requires much more change in hvplot to plot geometries nested in xarray.

cc: @hoxbro


import geopandas as gpd
import pandas as pd
import hvplot.pandas

import xarray as xr

uri = "gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg-chunk-1.zarr-v2"
era5_ds_sub = (
    # Open the dataset
    xr.open_zarr(uri, chunks={"time": 48}, consolidated=True)
    # Select the near-surface level
    .isel(level=0, drop=True)
    # subset in time
    .sel(time=slice("2017-01", "2018-01"))
    # reduce to two arrays
    [["2m_temperature", "u_component_of_wind"]]
)
era5_ds_sub


cities_df = pd.read_json(
    "hf://datasets/jamescalam/world-cities-geo/train.jsonl", lines=True
)
cities_eur = cities_df.loc[cities_df["continent"] == "Europe"]
cities_eur = gpd.GeoDataFrame(
    cities_eur,
    geometry=gpd.points_from_xy(cities_eur.longitude, cities_eur.latitude),
    crs="EPSG:4326",
).drop(["latitude", "longitude", "x", "y", "z"], axis=1)
import xvec

era5_europe_cities = era5_ds_sub.xvec.extract_points(
    cities_eur.geometry, x_coords="longitude", y_coords="latitude"
).drop_vars("index")

era5_europe_cities["2m_temperature"].isel(time=slice(0, 2)).hvplot()
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
import xvec
import hvplot.xarray

from geodatasets import get_path
chicago = gpd.read_file(get_path("geoda.chicago health"))

origin = destination = chicago.geometry.array
mode = ["car", "bike", "foot"]
date = pd.date_range("2023-01-01", periods=100)
hours = range(24)
rng = np.random.default_rng(1)
data = rng.integers(1, 100, size=(3, 100, 24, len(chicago), len(chicago)))
traffic_counts = xr.DataArray(
    data,
    coords=(mode, date, hours, origin, destination),
    dims=["mode", "date", "time", "origin", "destination"],
    name="traffic_counts",
).xvec.set_geom_indexes(["origin", "destination"], crs=chicago.crs)
traffic_counts.sel(date="2023-02-28", time=12, mode="bike").hvplot("traffic_counts", hover_cols=["date", "time"])

ahuang11 avatar Aug 30 '24 00:08 ahuang11

Codecov Report

Attention: Patch coverage is 26.66667% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88.68%. Comparing base (6c96c7e) to head (3243cbb). Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
hvplot/converter.py 10.00% 9 Missing :warning:
hvplot/util.py 60.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1405      +/-   ##
==========================================
+ Coverage   87.39%   88.68%   +1.28%     
==========================================
  Files          50       51       +1     
  Lines        7490     7509      +19     
==========================================
+ Hits         6546     6659     +113     
+ Misses        944      850      -94     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 30 '24 00:08 codecov[bot]

FWIW it's contributed by the maintainers of xarray and geopandas https://github.com/xarray-contrib/xvec/graphs/contributors

It's quite new, so not very established, but because it's by maintainers of xarray/geopandas + part of earthmover (the cofounders started the Pangeo movement), I imagine it'll establish its name over time.

The motivation can be found in the blog post https://earthmover.io/blog/vector-datacube-pt1

Some data is more naturally represented as a multi-dimensional cube. Consider a collection of weather stations that record temperature and windspeed. These measurements are stored in the columns of a geopandas.GeoDataFrame, while the coordinates of each weather station are stored as Shapely Point geometries in a geometry column. We can quickly access a lot of information and ask questions such as “how do temperatures vary across the elevation range covered by the weather stations”, and “where are windspeeds highest?” But, each time the weather station records a measurement, we get a new set of data for each variable. How should that new data be incorporated into the GeoDataFrame? While there are ways of representing such multi-dimensional data in tabular form (see Pebesma, 2022), the column structure is still fundamentally one-dimensional, and these strategies all involve duplicating data along either the row or column dimension.

In the weather station example, the data are fundamentally two-dimensional ([location, time]) and must be flattened to fit into a dataframe. Contrast this to raster data cubes, where data is explicitly represented as multi-dimensional. In this data model, adding new dimensions is easy, and popular tools reflect this fundamental concept. What would it look like, and how would our workflows change, if vector data were also represented as a cube?

Also, unsure whether this should be a part of geoviews first before hvplot

ahuang11 avatar Oct 18 '24 15:10 ahuang11

Thanks for the details. It looks indeed that it's very early stage in terms of adoption: image

I found this issue (https://github.com/xarray-contrib/xvec/issues/82) on their repo where they're discussion plotting capabilities. I'd encourage you to chime in and see with them if we could easily provide solutions. If a collaboration gets established, there's more chance we'll be successful, i.e. the interface we build ends up actually being used by real users.

Also, unsure whether this should be a part of geoviews first before hvplot

Also unsure, I imagine in hvPlot it should be integrated as a simple conversion layer while in GeoViews it'd be more involved.

maximlt avatar Oct 19 '24 08:10 maximlt