dstk icon indicating copy to clipboard operation
dstk copied to clipboard

Cities getting overwritten in geodict/text2places database

Open petewarden opened this issue 14 years ago • 2 comments

I'm seeing something strange in the cities table; it looks as though a lot of cities that are in the source data are missing from the populated geodict database, possibly getting clobbered on import.

Take Brooklyn, for example. In worldcitiespop.csv, grep finds 49 entries for 'brooklyn' (42 of which are in the US); in the geodict database, there are five entries for 'brooklyn', only one of which is in the US (and the US entry is in Alabama). The same seems to be true of other US cities like Rochester and Boston, each of which is found only once in the US (and in an alphabetically early state like AL or CA). Are the others getting clobbered on import? Or am I maybe making a mistake in looking through the database (not much experience with MySQL here).

The SQL query I'm using is:

SELECT city, country, region_code, population, lat, lon FROM cities WHERE city = 'Brooklyn'; Other things that might be relevant:

The populate_database.py script produces two errors when I run it: ./populate_database.py:49: Warning: Data truncated for column 'last_word' at row 1 (city, country, region_code, population, lat, lon, last_word))

./populate_database.py:49: Warning: Data truncated for column 'city' at row 1 (city, country, region_code, population, lat, lon, last_word))

populate_database.py won't work at all unless I first create the geodict database by hand, even though it looks as though the script is meant to handle that.

System info:

uname -a

Darwin wilkens-imac.wustl.edu 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386

mysql --version

mysql Ver 14.14 Distrib 5.1.56, for apple-darwin10.3.0 (i386) using readline 5.1

Any other info I can provide? Happy to do any kind of debugging that might help. Thanks!

petewarden avatar Mar 31 '11 23:03 petewarden

Good(ish) news. I tried the most recent DSTK VMware image (v35); the database of cities supplied with it is still borked, but a simple rerun of the included populate_database.rb script (which includes the change to the primary key made back in April) fixes it. Nice! Thanks.

wilkens avatar Jun 03 '11 00:06 wilkens

Thanks for trying that out, and apologies that the VMware image isn't working out of the box. I'll double-check the AMI as well, hopefully I ran the update there.

On Thu, Jun 2, 2011 at 5:48 PM, wilkens < [email protected]>wrote:

Good(ish) news. I tried the most recent DSTK VMware image (v35); the database of cities supplied with it is still borked, but a simple rerun of the included populate_database.rb script (which includes the change to the primary key made back in April) fixes it. Nice! Thanks.

Reply to this email directly or view it on GitHub: https://github.com/petewarden/dstk/issues/7#comment_1286061

petewarden avatar Jun 03 '11 20:06 petewarden