parser
parser copied to clipboard
chore: clean up wof dictionary generation code
I attempted to update the WOF resources today but lost enthusiasm half way through due to a bunch of different changes popping up.
This PR makes subsequent attempts at updating WOF considerably easier:
- update resources/whosonfirst/generate.js to sort dictionaries before writing them to disk (which makes diffs muuuch easier to read), also added some comments about potential gotchas
- sort existing dictionaries using the same sorting method (using the monstrous command below)
find resources/whosonfirst/dictionaries -type f -name '*.txt' \
| node -e 'const fs=require(`fs`); fs.readFileSync(0, `utf-8`).trim().split(`\n`).forEach(file => fs.writeFileSync(file, fs.readFileSync(file, `utf-8`).trim().split(`\n`).sort().join(`\n`)))'
this should be a no-op, it's only sorting existing dictionaries, not adding or removing from them.
data:image/s3,"s3://crabby-images/6a367/6a367ec28f9341f69c2a6d116f28a0ca0b8cd828" alt="Screenshot 2021-02-23 at 20 48 25"
😆
Should we also normalize names here ? I found some name with uppercase and accents like Épinay
even if LOWER
is used in the SQLites statement (thank you French localities :sweat_smile:).
grep 'Épinay' resources/whosonfirst/dictionaries/locality/name\:fra_x_preferred.txt
Second thought : Adding normalization will not improve diff reading, so I will merge as-is.