libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

[German] donation of new resources

Open tobwen opened this issue 8 years ago • 4 comments

In the last few years I have collected a lot of data about the German scheme of street names (prefixes, suffixes, abbreviations etc.). This was partly due to a cartography project and, of course, a geocoder.

Since I have noticed that the resources are not quite complete (and some tools like address_expanding don't work as expected), I would like to donate my data.

What is the right way to do this? A pull request? I would like to test my data beforehand. Can I replace the files in resources/ in my local branch or do I have to rebuild the model completely?

tobwen avatar Oct 13 '17 04:10 tobwen

Of course. Language data is always welcome. Pull requests are the preferred method for contributing, and for expansion-related APIs, our continuous integration will build the necessary files and test them automatically, pushing to S3 for everybody to use once the PR is merged.

To test locally, building libpostal's expansions involves three steps:

  1. You'll need Python and a few simple packages (pip install -r scripts/requirements-simple.txt). Then generate the C files with: python scripts/geodata/address_expansions/address_dictionaries.py
  2. Run ./bootstrap.sh, ./configure --datadir=/some/data/dir, and make (or just make if you've previously run the other two steps)
  3. Run ./src/build_address_dictionary

Anything which affects the expansion APIs is deterministic and doesn't require retraining any models.

albarrentine avatar Oct 13 '17 19:10 albarrentine

:+1: I would also be interested in the German data for Pelias

missinglink avatar Oct 27 '17 08:10 missinglink

I'll take care of it as soon as possible. I've had some pressing business lately. The data has been dormant with me for several months, I have to look through and edit it again, otherwise nobody wants to use it.

tobwen avatar Oct 27 '17 22:10 tobwen

Of course. Language data is always welcome. Pull requests are the preferred method for contributing, and for expansion-related APIs, our continuous integration will build the necessary files and test them automatically, pushing to S3 for everybody to use once the PR is merged.

To test locally, building libpostal's expansions involves three steps:

  1. You'll need Python and a few simple packages (pip install -r scripts/requirements-simple.txt). Then generate the C files with: python scripts/geodata/address_expansions/address_dictionaries.py
  2. Run ./bootstrap.sh, ./configure --datadir=/some/data/dir, and make (or just make if you've previously run the other two steps)
  3. Run ./src/build_address_dictionary

Anything which affects the expansion APIs is deterministic and doesn't require retraining any models.

But how I prepare data using geodata pipeline?

sharmaB01 avatar Nov 13 '22 14:11 sharmaB01