parser chore: clean up wof dictionary generation code

chore: clean up wof dictionary generation code

Open missinglink opened this issue 4 years ago • 2 comments

I attempted to update the WOF resources today but lost enthusiasm half way through due to a bunch of different changes popping up.

This PR makes subsequent attempts at updating WOF considerably easier:

update resources/whosonfirst/generate.js to sort dictionaries before writing them to disk (which makes diffs muuuch easier to read), also added some comments about potential gotchas
sort existing dictionaries using the same sorting method (using the monstrous command below)

find resources/whosonfirst/dictionaries -type f -name '*.txt' \
  | node -e 'const fs=require(`fs`); fs.readFileSync(0, `utf-8`).trim().split(`\n`).forEach(file => fs.writeFileSync(file, fs.readFileSync(file, `utf-8`).trim().split(`\n`).sort().join(`\n`)))'

this should be a no-op, it's only sorting existing dictionaries, not adding or removing from them.

Feb 23 '21 07:02 missinglink

Screenshot 2021-02-23 at 20 48 25

😆

Feb 23 '21 07:02 missinglink

Should we also normalize names here ? I found some name with uppercase and accents like Épinay even if LOWER is used in the SQLites statement (thank you French localities :sweat_smile:).

grep  'Épinay' resources/whosonfirst/dictionaries/locality/name\:fra_x_preferred.txt

Second thought : Adding normalization will not improve diff reading, so I will merge as-is.

Feb 23 '21 15:02 Joxit

parser parser copied to clipboard

chore: clean up wof dictionary generation code

parser
parser copied to clipboard