splot icon indicating copy to clipboard operation
splot copied to clipboard

Idea Collection: collaboration between `geopandas` and `PySAL`/`splot`

Open slumnitz opened this issue 7 years ago • 9 comments

Space to collect thoughts and ideas for a collaboration between geopandas and PySAl/ splot

The purpose of this issue is to start a discussion between the geopandas and PySAL/ splot community on how to best collaborate in the field of visualising geographic data and spatial analyses.

Open questions:

  • What is the scope and division of both projects? (Especially regarding map visualisation functionality?)
  • With both packages supporting each other, will releases be coordinated?
  • Choropleth visualisations located in geopandas or splot?
  • How sophisticated are geopandas .plot() methods and splot visualisations? Where to draw the line?
  • Which other visualisations are splot and geopandas planning on supporting in future? (points, polygons, lines, interactivity...)

Current connections:

  • Geopandas dataframes as required input parameter for certain visualisations
  • splot using geopandas .plot functionality for maps
  • geopandas choropleth visualisation building on esda.mapclassify

Ideas for new mapping features:

  • North Arrows
  • Scale bar
  • grid lines
  • extended legend functionality

Answered & Decisions:

Potential upcoming meeting:

  • Sprints/ or Weekday at SciPy

@jorisvandenbossche

slumnitz avatar Jul 05 '18 18:07 slumnitz

@TaylorOshan

slumnitz avatar Jul 05 '18 18:07 slumnitz

Hi, I know this is rather old but I don't think it was ever discussed on geopandas side (definitely not on GitHub). Was there some progress in the meantime? I think all these questions are still relevant and it would be nice to have an agreement/cooperation.

What is the scope and division of both projects? (Especially regarding map visualisation functionality?)

From my perspective, GeoPandas should support basic plotting covering standard use-cases. It sort of does that at the moment. Anything special (like vba, cartograms...) should be left to other packages. GeoPandas main focus is and will be on spatial data handling, not visualisation.

Which other visualisations are splot and geopandas planning on supporting in future?

We have been doing some minor enhancements and bugfixes recently and we are adding a support for missing values in the next release (https://github.com/geopandas/geopandas/pull/1156). We should give a bit more options to legend (https://github.com/geopandas/geopandas/issues/735), but I would not go much further than that. We do not plan to include any interactivity or similar stuff as it is covered by other packages already.

So things like north arrows, scale bars, grid lines etc., I would personally be happy to leave to splot.

Choropleth visualisations located in geopandas or splot?

As it is one the most basic visualisations, I think it should remain part of goepandas. Not necessarily in advanced versions like vba, but in the same version as it is now certainly.

How sophisticated are geopandas .plot() methods

Not sure how to answer this 😄.

cc @ResidentMario as this discussion should be related to geoplot as well.

On top of that, GeoPandas should definitely include links to other related packages and examples in docs way more than it does now.

martinfleis avatar Nov 13 '19 21:11 martinfleis

Hi @martinfleis, this post was part of my GSOC project and discussed in person at Scipy18 with @jorisvandenbossche. All of your thoughts basically reflect our discussions. Ultimately, splot with it's connection to PySAL is more thought of to be a Seaborne for geospatial statistical analysis.

From my perspective, GeoPandas should support basic plotting covering standard use-cases.

One thought here, as north arrows and scale bars are actually connected to the underlying geographic information and quite standardised for all sorts of geographic visualisation. Maybe that could be something to include in geopandas in future?

slumnitz avatar Nov 13 '19 21:11 slumnitz

One thought here, as north arrows and scale bars are actually connected to the underlying geographic information and quite standardised for all sorts of geographic visualisation. Maybe that could be something to include in geopandas in future?

It is just a matter of preference, but I would be happy with the similar solution we use for background tiles (contextily).

import contextily as ctx
ax = df.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

We could have something similar for arrows and scale bars (maybe within splot?):

import splot
ax = df.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
splot.add_arrow(ax)
splot.add_scalebar(ax)

But yeah, this can be part of GeoPandas if we agree that it is a better fit.

martinfleis avatar Nov 13 '19 21:11 martinfleis

to be a Seaborne for geospatial statistical analysis

I feel that @ResidentMario is trying to do the same with geoplot.

martinfleis avatar Nov 13 '19 21:11 martinfleis

personally, i think of geoplot more along the lines of proplot, in that it provides a nicer-than-default API, sensible aesthetics out of the box, tooling for better layouts, etc.

whereas splot is more like seaborn in that it implements additional types of statistical plots (e.g. seaborn provides clustermaps and violin plots and splot provides maps of weights matrices and the Moran scatterplot, etc).

that's not a perfectly clean distinction, since geoplot also does things like quadtrees and sankey diagrams, but my own concept is that splot and seaborn are oriented more towards stats and analysis

knaaptime avatar Nov 17 '19 00:11 knaaptime

Thanks for the comments @martinfleis and @knaaptime. It is extremely interesting to me to hear your impressions of how geoplot fits within the broader Python geospatial data science stack.

Let me state my own viewpoint.

geoplot as originally envisioned was absolutely meant to be a seaborn equivalent for geospatial plotting; this is even the tagline the library uses, "like seaborn for geospatial". Some of this emphasis was lost in the ensuing development history, as most of the work I have done past the initial conception stage has been in refactoring APIs, improving error messages, writing and rewriting documentation, and in general doing the work necessary to take the library from my lab project to a powerful, punchy toolkit that's as well-organized and approachable as possible.

In other words, most of my effort has gone into the developer experience of the library, moreso then its power. As a result, geoplot doesn't actually do that much—you can get away with just using geopandas plotting most of the time.

PySAL is actually a great example of a library with the opposite focus by the way: many powerful tools, but everything developed piecemeal overtime, with patchwork coverage of what's possible and what's not, what's easy and what's hard, what's well documented and (well, most things in PySAL aren't well-documented, something I know the team is aware of and working on).

I am now starting to get close to the point where I am happy with the code maturity and ergonomic quality of geoplot (though there's some work outstanding that still needs to be done). The next big steps are going to be investing in more and more powerful plot types. And as the gamut of what geoplot makes possible expands, the incremental usefulness of having this tool in your toolkit will go up greatly, and the library overall will start to seem more useful: a true power-belt, and not just a bunch of auxiliary visual options (as @knaaptime posits). More seaborn-like!

Some of the tools in splot (e.g. Moran plots) feel in-scope to me, e.g. Moran plots. But I also I suspect that some of the tools in splot will be too specialized for geoplot in scope. I would make the following comparison between the Python geospatial plotting stack and the standard Python data science plotting ecosystem, in order of increasing visual complexity:

  • cartopy is like matplotlib --- low-level plotting tools with full customizability
  • geopandas is like pandas --- your one-stop shop for quick one-plots, e.g. basic polygonal maps
  • geoplot is like seaborn --- an accessible expanded visual palette, e.g. kernel density maps
  • splot is like yellowbrick --- specialized visual power tools, e.g. local autocorrelation plots

I haven't yet seen evidence of how this stack of tools maps into workflows "in the wild". The best geospatial learning materials I am aware of is the excellent Automating-GIS-process course notes, and these never bother straying above simple geopandas plots. It feels like there's still a lot of work to do here.

I hope that makes sense. Let me know whether you agree or disagree with this assessment.

ResidentMario avatar Nov 17 '19 04:11 ResidentMario

I didn't mean to be reductive about geoplot, so apologies if it came off that way!

To put it a bit differently, the distinction i see is about the substantive focus of the plotting. seaborn provides a wonderful set of aesthetics and plotting utilities, that are useful in a variety of contexts, but it's intended explicitly for statistical data viz. Every one of the plotting examples in its tutorial is designed to visualize variance, or relationships between two variables, etc.

geoplot is similar in that it provides a wonderful set of utilities and nice aesthetics, but unlike seaborn it is not necessarily concerned with stats. geoplots gallery is full of great examples for plotting single variables--in the specific context of geographic data. In particular, it provides really useful transformations that are specifically useful in the geographic case (e.g. cartograms, quadtrees, KDEs).

splot, meanwhile, shares seaborns statistical emphasis, tailored to the spatial contect. Its focus is visualizing statistical relationships in space.

Yellowbrick seems less appropriate since they specialize in vizualizing ML models, whereas splot isn't really concerned specifically with modeling or ML. pysal's DNA is really more in inferential stats and econometrics than ML, so the analogy I draw is

seaborn : statsmodels :: splot : pysal

like waksom says in the seaborn tutorial,

In the spirit of Tukey, the regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. That is to say that seaborn is not itself a package for statistical analysis. To obtain quantitative measures related to the fit of regression models, you should use statsmodels. The goal of seaborn, however, is to make exploring a dataset through visualization quick and easy, as doing so is just as (if not more) important than exploring a dataset through tables of statistics.

things like the Moran scatterplots in splot don't generate a map or geospatial visualization, but they do give you your bearings before diving into a more rigorous (spatial) statistical analysis. That's the kind of workflow we'd use in something like one of the pysal workshops or my colleagues' forthcoming book

knaaptime avatar Nov 17 '19 21:11 knaaptime

I have only minor quibbles with your assessment, and I like your analogy as well.

Net-net, I'm comfortable with the level of overlap between these three tools (geopandas plotting, geoplot, and splot). I don't think there's much reason for consolidation.

ResidentMario avatar Nov 17 '19 21:11 ResidentMario