Choropleth maps with categorical legends are larger than those with continuous legends due to duplicate copies of shape outlines within HTML files
I am finding that choropleth maps that use a categorical value for the color parameter are significantly larger than those that use a continuous value. This appears to be due to duplicate copies of boundary data within the categorical maps' HTML files.
Here's an simple set of code that replicates this problem: (I ran it using the latest copies of both Plotly and Geopandas.)
# Categorical map tests
import plotly.express as px
import geopandas as gpd
gdf_states = gpd.read_file(
'https://raw.githubusercontent.com/ifstudies/simplified_shapefiles/\
refs/heads/main/state_shapefiles_simplified.json')
# Creating meaningless category columns for mapping purposes:
gdf_states['Continuous_Vals'] = gdf_states.index % 4 + 1
gdf_states['Categorical_Vals'] = gdf_states[
'Continuous_Vals'].astype('str')
gdf_states.set_index('NAME', inplace = True)
# Creating a map with a continuous color scale:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_continuous_scale_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Continuous_Vals', map_style = 'white-bg')
fig_continuous_scale_map.write_html('fig_continuous_scale_map.html',
include_plotlyjs='cdn')
# Creating a map with a categorical legend:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_categorical_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Categorical_Vals', map_style = 'white-bg')
fig_categorical_map.write_html('fig_categorical_map.html',
include_plotlyjs='cdn')
And here are screenshots of the basic maps created by this script:
Continuous map:
Categorical map:
The two choropleth maps created by this code are almost identical, except that the first uses a continuous scale for its color parameter and the second uses a categorical scale. However, while the continuous-scale map is around 290 KB in size, the categorical one is 1.1 MB in size.
A review of the HTML files explains why this is the case: state boundaries are defined just once within the continuous-scale map, but four times within the categorical map (once for each category, I assume). This results in a much larger file.
Obviously, both of these maps are still pretty small, but I'm finding this inefficiency to be an issue for maps with very high numbers of shapefiles (e.g. Census tracts). One particular map of Census tracts was around 45 MB in size when a continuous scale was used, but 160 MB when a categorical scale was applied.
If there's any way to get categorical maps to store only one set of region boundaries within their HTML files, that would be a huge help!
@camdecoster please see if this is reproducible at the JS level.
This seems to be a consequence of the way that Plotly Express works and won't show up in plotly.js. I'm still working on digging into how to work around the issue, but we should be able to filter the geojson to only include the relevant polygons for each trace. I wouldn't call it a bug, but it's an opportunity to improve efficiency. Thanks for letting us know.
Thank you--I really appreciate your help with this! As a workaround, I tweaked my color_continuous_scale values (using this guide as a reference) in order to have a continuous scale show a discrete set of colors. However, it would be fantastic to use a standard categorical approach for this, as users could then decide which chart items to display.