libpostal
libpostal copied to clipboard
unable to fetch libpostal data
We've been using libpostal for several years. Pulling the 1.1 tag and building fails to fetch the libpostal data into the DATADIR location. Manual updates with libpostal_data also fail.
My country is
U.S.
Here's how I'm using libpostal
Using libpostal with perl code to do address matching.
Here's what I did
Ubuntu 22.04 (x86_64)
$ apt-get update
$ apt-get -y install \
autoconf automake curl git libtool pkg-config wget make
$ mkdir /data
$ cd /tmp
$ git clone --depth 1 --branch v1.1 https://github.com/openvenues/libpostal.git
$ cd libpostal
$ ./bootstrap.sh
$ ./configure --datadir=/data
$ make -j4
$ make install
$ ldconfig
Here's what I got
$ libpostal_data download all /data/libpostal
Checking for new libpostal data file...
libpostal data file up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
Checking for new libpostal encoding="UTF-8"?>...
libpostal encoding="UTF-8"?> up to date
$ du -sxh /data/libpostal
16K /data/libpostal
$ ls -lh /data/libpostal
total 16K
-rw-r--r--. 1 root root 3 Mar 13 12:54 data_version
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_language_classifier
-rw-r--r--. 1 root root 21 Mar 13 12:54 last_updated_parser
Here's what I was expecting
In the past the data dir would have approximately 2GB of data. Running the installer or libpostal_data would properly populate this directory with data. Running the libpostal_data script with bash -x
produces the output below (excerpt). I get the same error message when attempting to download the data with a web browser.
$ bash -x libpostal_data download all /data/libpostal
...
++ curl --silent https://libpostal.s3.amazonaws.com/models/address_parser/latest
+ latest_parser='<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ parser_s3_prefix='models/address_parser/<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>'
+ download_file /data/libpostal/last_updated_parser /data/libpostal 'models/address_parser/<?xml' 'version="1.0"' 'encoding="UTF-8"?>' '<Error><Code>AccessDenied</Code><Message>Access' 'Denied</Message><RequestId>GDTXBM5NAHNSD2ES</RequestId><HostId>SzcJkJgwXVynWjDelOcxbpZVrpl1Ls7cYOPtO3OWvUjOAkNZ8mpq/AN4NIPiD7/qvYlyIZijuK581hfo/EC5Zj0+276oXON1sbmHv7ToOjA=</HostId></Error>' parser.tar.gz 'parser data file' address_parser
+ updated_path=/data/libpostal/last_updated_parser
For parsing issues, please answer "yes" or "no" to all that apply.
N/A
Here's what I think could be improved
correct the URLs or S3 access permissions.
I also cannot fetch the data. In the meantime, I'll try using https://github.com/Senzing/libpostal-data instead.
Let me know how the Senzing model goes.
I, unfortunately, cannot remember how it went.