plotly.js
plotly.js copied to clipboard
feat: Switch geodata providers
Description
Adds scripts to build topojson from UN sourced data and updates build process to run these scripts.
Closes #7334
Changes
- Adds script for downloading shapefiles, geojson
- Adds script for converting shapefiles, geojson into topojson
- Updates topojson path references
Testing
- Run
npm run build_topojsonand make sure the script completes successfully - Try loading the maps in ./dist/topojson in Mapshaper
- Open the test dashboard
- Try loading some of the geo plots and look for errors
Notes
- The new maps are built from this UN data source for countries, coastlines, and lands layers. The oceans, lakes, rivers, and subunits layers are built from Natural Earth Data.
- The current maps come from sane-topojson, which is entirely derived from Natural Earth Data
- I haven't been able to verify the resolution of the data from the UN. There's a reference here to the "UNGeodata 25 million scale", but I haven't been able to confirm what that means. As such, the labels '50m' and '110m' probably aren't accurate. But they need to remain as they are to not break references.
- Mapshaper can't handle antimeridian cutting, so features that cross the antimeridian can look weird (source)
- I want to credit sane-topojson for some of the design of these scripts. I'm attempting to create a (mostly) drop in replacement, so it made sense to follow those conventions.
TODO
- ~~[ ] Clean up shapefiles after converting to topojson~~ Saving everything into the build folder makes this unnecessary
- [x] Fix Antarctica maps (or just remove them)
- [x] Commit new maps to ./dist/topojson folder
- [ ] Fix broken tests (I'm sure there will be some)
- [x] Remove log statements in process_geodata.mjs (or hide them behind flag)
- [x] Add the 'usa' map (though it will be pretty bland without states)
- [x] Remove sane-topojson package
- [ ] Verify resolutions
- [x] Only include actual coastlines in coastlines
- [ ] Make sure 110m maps don't have any unintended artifacts (like South America)
- [x] Add UN geodata archive to repo and only attempt download if file can't be found
- [x] Add markdown log file in
draftlogs:7393_feat.md - [x] Remove unnecessary
propertiesinfo from final maps - [ ] Fix ocean fill issues that appear in some geo plots (see here for example)
- [x] Fix Sudan not filling properly in choropleths (see here for example)
- [ ] Update test baseline images per PR changes
- Sync with this branch.
npm ito install/update packages.npm run build_topojsonto build JSON files.- Go to mapshaper.org.
- Load
europe_50m.json. - Initially looks like the lakes are misplaced (for example, the Caspian Sea seems to overlap the Russian border in a weird way in the screenshot).
- On closer inspection, maybe this is valid: I'm unable to find a map showing how national borders cross the Caspian Sea, and Russia might very well have a claim to the northwest section.
- The three lakes in the north check out: one is in Sweden, the other two are near St. Petersburg.
Q for @camdecoster : did the old usa map have the states? as you say, it's not much use without that…
Trying to get npm run test-jasmine to work with the new topojson files, it looks like I'm tripping over a configuration issue.
- Original error message complained about not being able to find
../../../dist/topojsonworld_110m.json(note the missing '/' between 'topojson' and 'world'). - I modified
getTopojsonPathinsrc/lib/topojson_utils.jsas follows to insert the '/' if required:
topojsonUtils.getTopojsonPath = function(topojsonURL, topojsonName) {
- return topojsonURL + topojsonName + '.json';
+ if (topojsonURL.endsWith('/')) {
+ return topojsonURL + topojsonName + '.json';
+ } else {
+ return topojsonURL + '/' + topojsonName + '.json';
+ }
};
- I also modified
test/jasmine/karma.conf.jsas follows because we're getting the JSON files from the localdistdirectory not the installedsane-topojsonpackage:
-var pathToSaneTopojsonDist = path.join(__dirname, '..', '..', 'node_modules', 'sane-topojson', 'dist');
+var pathToTopojsonDist = path.join(__dirname, '..', '..', 'dist', 'topojson');
// ... and further down, edit to match the new variable name
- {pattern: pathToSaneTopojsonDist + '/**', included: false, watched: false, served: true}
+ {pattern: pathToTopojsonDist + '/**', included: false, watched: false, served: true}
This still produces error messages like the one shown below saying that the JSON files can't be found:
Failed: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
Error: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
at Object.<anonymous> (/Users/gvwilson/plotly/plotly.js/src/plots/geo/geo.js:137:35 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:147433:31)
at Object.event (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:504:42 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:4875:48)
at XMLHttpRequest.respond (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:1951:24 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:6326:30)
25 03 2025 10:40:03.170:WARN [web-server]: 404: /dist/topojson/world_110m.json
[note: the message above is then repeated several times]
I've tried variations on the path in test/jasmine/tests/geo_test.js specified by this line:
Plotly.setPlotConfig({ topojsonURL: '../../../dist/topojson' });
Adding an extra .. or removing one of the ones that's there doesn't affect the outcome.
@gvwilson yes the old maps had 'subunits' layers for a few regions (USA and Brazil did for sure). The UN maps don't include that information. Thanks for looking into the tests. I'm working on an update to get the tests working but I haven't pushed that yet. I'll try to get that out later today.
@etpinard when you have a chance could you give this PR a look? Would love your eyes. Thank you!
In the short term what was the source for US states/Canadian provinces in the old dataset?
@camdecoster One small suggestion: Either add tasks/topojson/un_geodata_simplified.zip to .gitignore, OR delete the file as part of cleanup from the get_geodata task.
@emilykl thanks for pointing that out. I actually meant to ask if that should be saved in the repo. Maybe that's not necessary, but if the files change in the future then the build process could break. Granted, we're not saving the Natural Earth shapefiles, so that is another potential point for things to break. Ultimately the question is, should we save the input files in the repo or download them each time when building the library?
@camdecoster It's a good question... curious what @gvwilson @ndrezn think.
Presumably both the UN data and the Natural Earth data will be updated periodically as the world changes (is my assumption correct? or are we referencing a static URL that will never change?) and we will want to pull in the updated data.
On the other hand that's not necessarily a step that we want or need to be doing on EVERY build necessarily.
Are there any steps in the build process that you think are particularly sensitive to changes in the data? How much work would it be to fix the build process if/when the data changes?
Decision for now is to download the data afresh each time the build runs; we'll come back in a week or two and modify the build to cache the data in the repo. cc @emilykl @camdecoster
Are there any steps in the build process that you think are particularly sensitive to changes in the data? How much work would it be to fix the build process if/when the data changes?
@emilykl I'm not worried about the Natural Earth data. sane-topojson is years old and it still parses NE data correctly. The UN data might change, but I tried to reference attributes that would be set by standards (country codes, etc.). Regardless, I don't think it would be very difficult to update the scripts to filter the data correctly.
@gvwilson I updated the download script to save into the build folder, so that will get ignored by git. If you're running the build multiple times locally, it will reuse the downloaded files.
👍 from me once the image diffs we discussed are addressed -- looks great.
@camdecoster Great work :muscle: :1st_place_medal:
There is a notable jump between long horizontal border between Canada and US when comparing the diff of test/image/baselines/geo_text_chart_arrays.png.
Wondering if adding more vertices on such segments (as part of the post-process script) could help improve the results?
@archmoj I've implemented the changes that you suggested. Could you take another look? Regarding the US/Canada border, that's just how the geodata looks coming from the UN. Manipulating that should be possible, but I'd rather wait until after these maps land to look at adding that.
Files and size changes: Should the new files have UN prefix to avoid overwriting on the CDN?
old dist/topojson/
37K | africa_110m.json
144K | africa_50m.json
55K | asia_110m.json
353K | asia_50m.json
33K | europe_110m.json
194K | europe_50m.json
66K | north-america_110m.json
980K | north-america_50m.json
22K | south-america_110m.json
165K | south-america_50m.json
48K | usa_110m.json
460K | usa_50m.json
134K | world_110m.json
1075K | world_50m.json
new topojson/dist
37K | africa_110m.json
120K | africa_50m.json
15K | antarctica_110m.json
41K | antarctica_50m.json
82K | asia_110m.json
382K | asia_50m.json
39K | europe_110m.json
198K | europe_50m.json
75K | north-america_110m.json
588K | north-america_50m.json
46K | oceania_110m.json
328K | oceania_50m.json
23K | south-america_110m.json
168K | south-america_50m.json
70K | usa_110m.json
322K | usa_50m.json
288K | world_110m.json
1655K | world_50m.json
Files and size changes: Should the new files have UN prefix to avoid overwriting on the CDN?
The file names need to be the same for the lookup to work in the library. We could update the lookup function to add a suffix, but that seems like the wrong direction to go. Eventually, the new maps will be standard. For now, we could manually update the CDN to include the new maps with a suffix. We could announce the change and then eventually update the CDN to only include the new maps. Alternatively, we could include the old maps in a legacy folder.
In regards to new files:
antarctica_110m.json
antarctica_50m.json
oceania_110m.json
oceania_50m.json
would you consider adding new scopes for antarctica and oceania here?
https://github.com/plotly/plotly.js/blob/5e2163b2f3377187152bdfdffe1a9e64998ce5aa/src/plots/geo/constants.js#L145-L189
@archmoj I created #7467 to look at the US/Canada border in the future.
Well done. Huge effort, excited to have this land in the product!
congratulations all around - thank you
:clap: