parser icon indicating copy to clipboard operation
parser copied to clipboard

chore: clean up wof dictionary generation code

Open missinglink opened this issue 4 years ago • 2 comments

I attempted to update the WOF resources today but lost enthusiasm half way through due to a bunch of different changes popping up.

This PR makes subsequent attempts at updating WOF considerably easier:

  • update resources/whosonfirst/generate.js to sort dictionaries before writing them to disk (which makes diffs muuuch easier to read), also added some comments about potential gotchas
  • sort existing dictionaries using the same sorting method (using the monstrous command below)
find resources/whosonfirst/dictionaries -type f -name '*.txt' \
  | node -e 'const fs=require(`fs`); fs.readFileSync(0, `utf-8`).trim().split(`\n`).forEach(file => fs.writeFileSync(file, fs.readFileSync(file, `utf-8`).trim().split(`\n`).sort().join(`\n`)))'

this should be a no-op, it's only sorting existing dictionaries, not adding or removing from them.

missinglink avatar Feb 23 '21 07:02 missinglink

Screenshot 2021-02-23 at 20 48 25

😆

missinglink avatar Feb 23 '21 07:02 missinglink

Should we also normalize names here ? I found some name with uppercase and accents like Épinay even if LOWER is used in the SQLites statement (thank you French localities :sweat_smile:).

grep  'Épinay' resources/whosonfirst/dictionaries/locality/name\:fra_x_preferred.txt 

Second thought : Adding normalization will not improve diff reading, so I will merge as-is.

Joxit avatar Feb 23 '21 15:02 Joxit