find-that-charity
find-that-charity copied to clipboard
Reconciliation for UK Charities and other nonprofit organisations, with elasticsearch back end.
Find that charity
Elasticsearch-powered search engine for looking for charities and other non-profit organisations. Allows for:
- importing data nearly 20 sources in the UK, ensuring that duplicates are matched to one record.
- An elasticsearch index that can be queried.
- Org-ids are added to organisations.
- Reconciliation API for searching organisations, based on an optimised search query.
- Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
- HTML pages for searching for a charity
Installation
- Clone repository
- Create virtual environment (
python -m venv env
) - Activate virtual environment (
env/bin/activate
orenv/Scripts\activate
) - Install requirements (
pip install -r requirements.txt
) - Install postgres
- Start postgres
- Install elasticsearch 7 - you may need to increase available memory (see below)
- Start elasticsearch
- Create
.env
file in root directory. Contents based on.env.example
. - Create the database tables (
python ./manage.py migrate && python ./manage.py createcachetable
) - Import data on charities (
python ./manage.py import_charities
) - Import data on nonprofit companies (
python ./manage.py import_companies
) - Import data on other non-profit organisations (
python ./manage.py import_all
) - Add organisations to elasticsearch index (
python ./manage.py es_index
) - (Don't use the defaultsearch_index
command as this won't setup aliases correctly)
Dokku Installation
1. Set up dokku server
SSH into server and run:
# create app
dokku apps:create ftc
# postgres
sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git postgres
dokku postgres:create ftc-db
dokku postgres:link ftc-db ftc
# elasticsearch
sudo dokku plugin:install https://github.com/dokku/dokku-elasticsearch.git elasticsearch
export ELASTICSEARCH_IMAGE="elasticsearch"
export ELASTICSEARCH_IMAGE_VERSION="7.7.1"
dokku elasticsearch:create ftc-es
dokku elasticsearch:link ftc-es ftc
# configure elasticsearch 7:
# https://github.com/dokku/dokku-elasticsearch/issues/72#issuecomment-510771763
# setup elasticsearch increased memory (might be needed)
nano /var/lib/dokku/services/elasticsearch/ftc-es/config/jvm.options
# replace `-Xms512m` with `-Xms2g`
# replace `-Xms512m` with `-Xmx2g`
# restart elasticsearch
dokku elasticsearch:restart ftc-es
# SSL
sudo dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
dokku config:set --no-restart ftc [email protected]
dokku letsencrypt ftc
dokku letsencrypt:cron-job --add
2. Add as a git remote and push
On local machine:
git remote add dokku dokku@SERVER_HOST:ftc
git push dokku master
3. Setup and run import
On Dokku server run:
# setup
dokku run ftc python ./manage.py migrate
dokku run ftc python ./manage.py createcachetable
# run import
dokku run ftc python ./manage.py charity_setup
dokku run ftc python ./manage.py import_charities
dokku run ftc python ./manage.py import_companies
dokku run ftc python ./manage.py import_all
dokku run ftc python ./manage.py es_index
Server
The server uses django. Run it with the following command:
python ./manage.py runserver
The server offers the following API endpoints:
-
/reconcile
: a reconciliation service API conforming to the OpenRefine reconciliation API specification. -
/charity/12345
: Look up information about a particular charity
Todo
Current status is a proof-of-concept, needs a bit of work to get up and running.
Priorities:
- tests for ensuring data is correctly imported
- server tests
- use results of
server/recon_test.py
to produce the best reconciliation search query for use in the server (recon_test_7
seems the best at the moment) - threshold for when to use the result vs discard
Future development:
- upload a CSV file and reconcile each row with a charity
- allow updating a charity with additional possible names
Testing
coverage run manage.py test && coverage html
python -m http.server -d htmlcov --bind 127.0.0.1 8001