plotly.js icon indicating copy to clipboard operation
plotly.js copied to clipboard

feat: Switch geodata providers

Open camdecoster opened this issue 8 months ago • 11 comments

Description

Adds scripts to build topojson from UN sourced data and updates build process to run these scripts.

Closes #7334

Changes

  • Adds script for downloading shapefiles, geojson
  • Adds script for converting shapefiles, geojson into topojson
  • Updates topojson path references

Testing

  • Run npm run build_topojson and make sure the script completes successfully
  • Try loading the maps in ./dist/topojson in Mapshaper
  • Open the test dashboard
  • Try loading some of the geo plots and look for errors

Notes

  • The new maps are built from this UN data source for countries, coastlines, and lands layers. The oceans, lakes, rivers, and subunits layers are built from Natural Earth Data.
  • The current maps come from sane-topojson, which is entirely derived from Natural Earth Data
  • I haven't been able to verify the resolution of the data from the UN. There's a reference here to the "UNGeodata 25 million scale", but I haven't been able to confirm what that means. As such, the labels '50m' and '110m' probably aren't accurate. But they need to remain as they are to not break references.
  • Mapshaper can't handle antimeridian cutting, so features that cross the antimeridian can look weird (source)
  • I want to credit sane-topojson for some of the design of these scripts. I'm attempting to create a (mostly) drop in replacement, so it made sense to follow those conventions.

TODO

  • ~~[ ] Clean up shapefiles after converting to topojson~~ Saving everything into the build folder makes this unnecessary
  • [x] Fix Antarctica maps (or just remove them)
  • [x] Commit new maps to ./dist/topojson folder
  • [ ] Fix broken tests (I'm sure there will be some)
  • [x] Remove log statements in process_geodata.mjs (or hide them behind flag)
  • [x] Add the 'usa' map (though it will be pretty bland without states)
  • [x] Remove sane-topojson package
  • [ ] Verify resolutions
  • [x] Only include actual coastlines in coastlines
  • [ ] Make sure 110m maps don't have any unintended artifacts (like South America)
  • [x] Add UN geodata archive to repo and only attempt download if file can't be found
  • [x] Add markdown log file in draftlogs: 7393_feat.md
  • [x] Remove unnecessary properties info from final maps
  • [ ] Fix ocean fill issues that appear in some geo plots (see here for example)
  • [x] Fix Sudan not filling properly in choropleths (see here for example)
  • [ ] Update test baseline images per PR changes

camdecoster avatar Mar 20 '25 03:03 camdecoster

  1. Sync with this branch.
  2. npm i to install/update packages.
  3. npm run build_topojson to build JSON files.
  4. Go to mapshaper.org.
  5. Load europe_50m.json.
  6. Initially looks like the lakes are misplaced (for example, the Caspian Sea seems to overlap the Russian border in a weird way in the screenshot).
  7. On closer inspection, maybe this is valid: I'm unable to find a map showing how national borders cross the Caspian Sea, and Russia might very well have a claim to the northwest section.
  8. The three lakes in the north check out: one is in Sweden, the other two are near St. Petersburg. Screenshot 2025-03-25 at 9 58 13 AM

gvwilson avatar Mar 25 '25 14:03 gvwilson

Q for @camdecoster : did the old usa map have the states? as you say, it's not much use without that…

gvwilson avatar Mar 25 '25 14:03 gvwilson

Trying to get npm run test-jasmine to work with the new topojson files, it looks like I'm tripping over a configuration issue.

  1. Original error message complained about not being able to find ../../../dist/topojsonworld_110m.json (note the missing '/' between 'topojson' and 'world').
  2. I modified getTopojsonPath in src/lib/topojson_utils.js as follows to insert the '/' if required:
 topojsonUtils.getTopojsonPath = function(topojsonURL, topojsonName) {
-    return topojsonURL + topojsonName + '.json';
+    if (topojsonURL.endsWith('/')) {
+       return topojsonURL + topojsonName + '.json';
+    } else {
+       return topojsonURL + '/' + topojsonName + '.json';
+    }
 };
  1. I also modified test/jasmine/karma.conf.js as follows because we're getting the JSON files from the local dist directory not the installed sane-topojson package:
-var pathToSaneTopojsonDist = path.join(__dirname, '..', '..', 'node_modules', 'sane-topojson', 'dist');
+var pathToTopojsonDist = path.join(__dirname, '..', '..', 'dist', 'topojson');

// ... and further down, edit to match the new variable name

-        {pattern: pathToSaneTopojsonDist + '/**', included: false, watched: false, served: true}
+        {pattern: pathToTopojsonDist + '/**', included: false, watched: false, served: true}

This still produces error messages like the one shown below saying that the JSON files can't be found:

	Failed: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	Error: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	    at Object.<anonymous> (/Users/gvwilson/plotly/plotly.js/src/plots/geo/geo.js:137:35 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:147433:31)
	    at Object.event (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:504:42 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:4875:48)
	    at XMLHttpRequest.respond (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:1951:24 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:6326:30)
25 03 2025 10:40:03.170:WARN [web-server]: 404: /dist/topojson/world_110m.json
[note: the message above is then repeated several times]

I've tried variations on the path in test/jasmine/tests/geo_test.js specified by this line:

Plotly.setPlotConfig({ topojsonURL: '../../../dist/topojson' });

Adding an extra .. or removing one of the ones that's there doesn't affect the outcome.

gvwilson avatar Mar 25 '25 14:03 gvwilson

@gvwilson yes the old maps had 'subunits' layers for a few regions (USA and Brazil did for sure). The UN maps don't include that information. Thanks for looking into the tests. I'm working on an update to get the tests working but I haven't pushed that yet. I'll try to get that out later today.

camdecoster avatar Mar 25 '25 14:03 camdecoster

@etpinard when you have a chance could you give this PR a look? Would love your eyes. Thank you!

In the short term what was the source for US states/Canadian provinces in the old dataset?

ndrezn avatar Mar 25 '25 15:03 ndrezn

@camdecoster One small suggestion: Either add tasks/topojson/un_geodata_simplified.zip to .gitignore, OR delete the file as part of cleanup from the get_geodata task.

Screen Shot 2025-04-22 at 11 59 42 AM

emilykl avatar Apr 22 '25 16:04 emilykl

@emilykl thanks for pointing that out. I actually meant to ask if that should be saved in the repo. Maybe that's not necessary, but if the files change in the future then the build process could break. Granted, we're not saving the Natural Earth shapefiles, so that is another potential point for things to break. Ultimately the question is, should we save the input files in the repo or download them each time when building the library?

camdecoster avatar Apr 22 '25 17:04 camdecoster

@camdecoster It's a good question... curious what @gvwilson @ndrezn think.

Presumably both the UN data and the Natural Earth data will be updated periodically as the world changes (is my assumption correct? or are we referencing a static URL that will never change?) and we will want to pull in the updated data.

On the other hand that's not necessarily a step that we want or need to be doing on EVERY build necessarily.

Are there any steps in the build process that you think are particularly sensitive to changes in the data? How much work would it be to fix the build process if/when the data changes?

emilykl avatar Apr 22 '25 17:04 emilykl

Decision for now is to download the data afresh each time the build runs; we'll come back in a week or two and modify the build to cache the data in the repo. cc @emilykl @camdecoster

gvwilson avatar Apr 22 '25 18:04 gvwilson

Are there any steps in the build process that you think are particularly sensitive to changes in the data? How much work would it be to fix the build process if/when the data changes?

@emilykl I'm not worried about the Natural Earth data. sane-topojson is years old and it still parses NE data correctly. The UN data might change, but I tried to reference attributes that would be set by standards (country codes, etc.). Regardless, I don't think it would be very difficult to update the scripts to filter the data correctly.

@gvwilson I updated the download script to save into the build folder, so that will get ignored by git. If you're running the build multiple times locally, it will reuse the downloaded files.

camdecoster avatar Apr 22 '25 23:04 camdecoster

👍 from me once the image diffs we discussed are addressed -- looks great.

emilykl avatar Apr 23 '25 17:04 emilykl

@camdecoster Great work :muscle: :1st_place_medal: There is a notable jump between long horizontal border between Canada and US when comparing the diff of test/image/baselines/geo_text_chart_arrays.png. Wondering if adding more vertices on such segments (as part of the post-process script) could help improve the results?

archmoj avatar Jul 07 '25 21:07 archmoj

@archmoj I've implemented the changes that you suggested. Could you take another look? Regarding the US/Canada border, that's just how the geodata looks coming from the UN. Manipulating that should be possible, but I'd rather wait until after these maps land to look at adding that.

camdecoster avatar Jul 10 '25 16:07 camdecoster

Files and size changes: Should the new files have UN prefix to avoid overwriting on the CDN?

old dist/topojson/

   37K | africa_110m.json
  144K | africa_50m.json
   55K | asia_110m.json
  353K | asia_50m.json
   33K | europe_110m.json
  194K | europe_50m.json
   66K | north-america_110m.json
  980K | north-america_50m.json
   22K | south-america_110m.json
  165K | south-america_50m.json
   48K | usa_110m.json
  460K | usa_50m.json
  134K | world_110m.json
 1075K | world_50m.json
new topojson/dist

   37K | africa_110m.json
  120K | africa_50m.json
   15K | antarctica_110m.json
   41K | antarctica_50m.json
   82K | asia_110m.json
  382K | asia_50m.json
   39K | europe_110m.json
  198K | europe_50m.json
   75K | north-america_110m.json
  588K | north-america_50m.json
   46K | oceania_110m.json
  328K | oceania_50m.json
   23K | south-america_110m.json
  168K | south-america_50m.json
   70K | usa_110m.json
  322K | usa_50m.json
  288K | world_110m.json
 1655K | world_50m.json

archmoj avatar Jul 10 '25 17:07 archmoj

Files and size changes: Should the new files have UN prefix to avoid overwriting on the CDN?

The file names need to be the same for the lookup to work in the library. We could update the lookup function to add a suffix, but that seems like the wrong direction to go. Eventually, the new maps will be standard. For now, we could manually update the CDN to include the new maps with a suffix. We could announce the change and then eventually update the CDN to only include the new maps. Alternatively, we could include the old maps in a legacy folder.

camdecoster avatar Jul 10 '25 21:07 camdecoster

In regards to new files:

antarctica_110m.json
antarctica_50m.json
oceania_110m.json
oceania_50m.json

would you consider adding new scopes for antarctica and oceania here? https://github.com/plotly/plotly.js/blob/5e2163b2f3377187152bdfdffe1a9e64998ce5aa/src/plots/geo/constants.js#L145-L189

archmoj avatar Jul 11 '25 14:07 archmoj

@archmoj I created #7467 to look at the US/Canada border in the future.

camdecoster avatar Jul 14 '25 18:07 camdecoster

Well done. Huge effort, excited to have this land in the product!

ndrezn avatar Jul 15 '25 13:07 ndrezn

congratulations all around - thank you

gvwilson avatar Jul 15 '25 13:07 gvwilson

:clap:

etpinard avatar Nov 07 '25 20:11 etpinard