documentcloud
documentcloud copied to clipboard
DocumentCloud's back end source code - Please report bugs, issues and feature requests to [email protected]
DocumentCloud
DocumentCloud · Squarelet · MuckRock · DocumentCloud-Frontend
Analyze, Annotate, Publish. Turn documents into data.
Prerequisites
You must first have these set up and ready to go:
- Squarelet. DocumentCloud depends on Squarelet for user authentication. As the services need to communicate directly, the development environment for DocumentCloud depends on the development environment for Squarelet - the DocumentCloud docker containers will join Squarelet's docker network. Please install Squarelet and set up its development environment first.
- DocumentCloud frontend
*Note the front end will not be functional until you complete the current install.
Install
Software required
Installation of DocumentCloud and its Authentication System
- Install software above and Git Large File support using these instructions.
- Ensure you have at least an additional 11 gigabytes of hard disk space allocated to Docker for these purposes.
- Ensure your Docker host application has at least 7gb of memory allocated, 10gb preferred.
- These instructions create 3 distinct docker compose sessions, with the Squarelet session hosting the shared central network.
- Check out the git repository -
git clone [email protected]:MuckRock/documentcloud.git - Enter the directory -
cd documentcloud - Run the dotenv initialization script -
python initialize_dotenvs.pyThis will create files with the environment variables needed to run the development environment. - Set
api.dev.documentcloud.organdminio.documentcloud.orgto point to localhost -echo "127.0.0.1 api.dev.documentcloud.org minio.documentcloud.org" | sudo tee -a /etc/hosts - Run
export COMPOSE_FILE=local.yml;in any of your command line sessions so that docker compose finds the configuration. - Run
docker compose up. - Enter
api.dev.documentcloud.org/into your browser - you should see the Django API root page. Note thatapiis beforedevin this service URL. - In
.envs/.local/.djangoset the following environment variables:
SQUARELET_KEYto the value of Client ID from the Squarelet ClientSQUARELET_SECRETto the value of Client SECRET from the Squarelet Client- Additionally, get the value for
JWT_VERIFYING_KEYby opening the Squarelet Django shell usinginv shelland copying thesettings.SIMPLE_JWT['VERIFYING_KEY'](remove the leadingb'and the trailing', leave the\nportions as-is)
- You must restart the Docker Compose session (via the command
docker compose downfollowed bydocker compose up) each time you change a.djangofile for it to take effect. - Log in using the Squarelet superuser on the locally-running Documentcloud-frontend that you installed earlier at https://dev.documentcloud.org
SQUARELET_WHITELIST_VERIFIED_JOURNALISTS=Trueenvironment variable makes it so only verified journalists can log into DocumentCloud.- Use the squarelet admin Organization page to mark your organization as a verified journalist to allow upload to DocumentCloud.
- Make your Squarelet superuser also a superuser on DocumentCloud Django: Run
inv shellin the DocumentCloud folder and use these commands (no indent):tempUser = User.objects.all()[0] tempUser.is_superuser = True tempUser.save() tempUser.is_staff = True tempUser.save()
- Go to Django admin for DocumentCloud and add the required static flat page called
/tipofday/. It can be blank. Do not prefix the URL with/pages/. Specifying theSiteasexample.comis alright. - Create an initial Minio bucket to simulate AWS S3 locally:
- Reference your DocumentCloud
.djangofile for these variables: - Visit the
MINIO_URLwith a browser, likely at this address, and login with the minioMINIO_ACCESS_KEYandMINIO_SECRET_KEY - At the bottom right corner click the round plus button and then click the first circle that appears above it to "create bucket".
- Create a bucket called
documents
- Reference your DocumentCloud
- Upload a document:
- Check your memory allocation on Docker is at least 7gb. A sign that you do not have enough memory allocated is if containers are randomly failing or if your system is swapping heavily, especially when uploading documents.
- The "upload" button should not be grayed out (if it is, check your user organization Verified Journalist status above)
- If you get an error on your console about signatures, fix minio as above.
- If you get an error on your console about tipofday not found, add the static page as above.
- Develop DocumentCloud and its frontend!
- You can run the tests with
inv test.
- If you want to run a subset of the tests, you can specify the directory containing the test you want with the
pathswitch like so:inv test --path documentcloud/documents.- You can specify a single file in
--pathif you only want to run the tests in that file.
- You can specify a single file in