[German] donation of new resources
In the last few years I have collected a lot of data about the German scheme of street names (prefixes, suffixes, abbreviations etc.). This was partly due to a cartography project and, of course, a geocoder.
Since I have noticed that the resources are not quite complete (and some tools like address_expanding don't work as expected), I would like to donate my data.
What is the right way to do this? A pull request? I would like to test my data beforehand. Can I replace the files in resources/ in my local branch or do I have to rebuild the model completely?
Of course. Language data is always welcome. Pull requests are the preferred method for contributing, and for expansion-related APIs, our continuous integration will build the necessary files and test them automatically, pushing to S3 for everybody to use once the PR is merged.
To test locally, building libpostal's expansions involves three steps:
- You'll need Python and a few simple packages (
pip install -r scripts/requirements-simple.txt). Then generate the C files with:python scripts/geodata/address_expansions/address_dictionaries.py - Run
./bootstrap.sh,./configure --datadir=/some/data/dir, andmake(or justmakeif you've previously run the other two steps) - Run
./src/build_address_dictionary
Anything which affects the expansion APIs is deterministic and doesn't require retraining any models.
:+1: I would also be interested in the German data for Pelias
I'll take care of it as soon as possible. I've had some pressing business lately. The data has been dormant with me for several months, I have to look through and edit it again, otherwise nobody wants to use it.
Of course. Language data is always welcome. Pull requests are the preferred method for contributing, and for expansion-related APIs, our continuous integration will build the necessary files and test them automatically, pushing to S3 for everybody to use once the PR is merged.
To test locally, building libpostal's expansions involves three steps:
- You'll need Python and a few simple packages (
pip install -r scripts/requirements-simple.txt). Then generate the C files with:python scripts/geodata/address_expansions/address_dictionaries.py- Run
./bootstrap.sh,./configure --datadir=/some/data/dir, andmake(or justmakeif you've previously run the other two steps)- Run
./src/build_address_dictionaryAnything which affects the expansion APIs is deterministic and doesn't require retraining any models.
But how I prepare data using geodata pipeline?