libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

unable to fetch libpostal data

Open dart-mtucker opened this issue 1 year ago • 3 comments

We've been using libpostal for several years. Pulling the 1.1 tag and building fails to fetch the libpostal data into the DATADIR location. Manual updates with libpostal_data also fail.


My country is

U.S.


Here's how I'm using libpostal

Using libpostal with perl code to do address matching.


Here's what I did

Ubuntu 22.04 (x86_64)

$ apt-get update
$ apt-get -y install \
	    autoconf automake curl git libtool pkg-config wget make
$ mkdir /data
$ cd /tmp
$ git clone --depth 1 --branch v1.1 https://github.com/openvenues/libpostal.git
$ cd libpostal 
$ ./bootstrap.sh
$ ./configure --datadir=/data
$ make -j4
$ make install 
$ ldconfig

Here's what I got

$ libpostal_data download all /data/libpostal
Checking for new libpostal data file...
libpostal data file up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
$ du -sxh /data/libpostal
16K     /data/libpostal
$ ls -lh /data/libpostal
total 16K
-rw-r--r--. 1 root root  3 Mar 13 12:54 data_version
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_language_classifier
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_parser

Here's what I was expecting

In the past the data dir would have approximately 2GB of data. Running the installer or libpostal_data would properly populate this directory with data. Running the libpostal_data script with bash -x produces the output below (excerpt). I get the same error message when attempting to download the data with a web browser.

$ bash -x libpostal_data download all /data/libpostal
...
++ curl --silent https://libpostal.s3.amazonaws.com/models/address_parser/latest
+ latest_parser='<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ parser_s3_prefix='models/address_parser/<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ download_file /data/libpostal/last_updated_parser /data/libpostal 'models/address_parser/<?xml' 'version="1.0"' 'encoding="UTF-8"?>' '<Error><Code>AccessDenied</Code><Message>Access' 'Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>' parser.tar.gz 'parser data file' address_parser
+ updated_path=/data/libpostal/last_updated_parser

For parsing issues, please answer "yes" or "no" to all that apply.

N/A

Here's what I think could be improved

correct the URLs or S3 access permissions.

dart-mtucker avatar Mar 13 '23 14:03 dart-mtucker

I also cannot fetch the data. In the meantime, I'll try using https://github.com/Senzing/libpostal-data instead.

hendursaga avatar Mar 16 '23 20:03 hendursaga

Let me know how the Senzing model goes.

brianmacy avatar Oct 26 '23 17:10 brianmacy

I, unfortunately, cannot remember how it went.

hendursaga avatar Dec 11 '23 02:12 hendursaga