heissdocs
heissdocs copied to clipboard
This project is no longer maintained
Note: This project is not maintained anymore
heiรdocs - A Document Query Application ๐๐
# Under Active Development #
Add a searchable layer on top of your PDFs!
Fully open-source and ready to be deployed. You store, own, and control the data.
Demo:
Note:
This is a project in progress, so please expect things to break as it moves forward. But the vision of this project is to allow the user to NOT be locked into an ecosystem, so your data is governed and stored by you - therefore even if the app breaks, your data should be supported and can be accessed using tools already at your disposal.
Usage
What is the purpose of this project?
It is to allow a user or an organization to keep track of their PDF files. The complicated thing about PDFs is that they aren't searchable by content. Simply upload a scanned or normal PDF and start searching for content in it with the undisputed power of Elasticsearch (or a NoSQL database)!
heiรdocs creates a search layer for your PDFs, down to the exact page (Working on pointing to the exact word!),
- Set up according to the instructions under
Setup - Upload a file on the Dashboard
- Start searching!
Features
- โ๏ธ Multi-cloud support (AWS, GCP, Azure)
- ๐ฌ Semantic search (Langchain + OpenAI)
- ๐ฟ Multiple Storage Options
- ๐ Powerful Search + Versatile Storage
- ๐ View source documents
- ๐ Full ownership of data
- ๐ Completely open-source
- ๐ป Self-hosted
- ... more things to come + feel free to add in requests!
Setup
Pre-requisites
Please set up the required services before starting the application. You can follow the documentation to configure all services.
- Auth0 - required even before startup:
- For Auth0 you will need to get the required values from the Auth0 portal and paste them accordingly in the
.envfiles infrontendandapp. This needs to be configured even before building the application.
- For Auth0 you will need to get the required values from the Auth0 portal and paste them accordingly in the
Setting up
Start by creating a .env file in the root directory and fill in the values according to the .env.example file.
Before startup, only the Auth0 values need to be set up. Please follow the documentation for the full guide.
cp .env.example .env
The values in the root .env file can remain unchanged unless you are planning on hosting each of the services individually.
Similarly, create a .env file inside the app, frontend, and engine folders and fill them in following the instructions in the respective .env.example files.
cp frontend/.env.example frontend/.env
cp app/.env.example app/.env
cp engine/.env.example engine/.env
All the keys except Auth0 keys, can be left untouched. Everything else is settable in settings.
Running
Ensure that the credentials that you pasted in the .env files have the necessary authorizations for operations such as GET, PUT, LIST ... etc.
Once your .env files are ready, navigate to the root directory and run:
docker compose up --build
Then go to localhost:8080 and log in.
[Optional]
In case you want hot-reload on your frontend, you can choose to run the services separately
Run the backend services:
docker compose -f docker-compose.yaml up --build
If you want elasticsearch locally running as well, you can include the docker-compose.elasticsearch.override.yaml file as well in the docker compose command.
docker compose -f docker-compose.yaml -f docker-compose.elasticsearch.override.yaml up --build
Run the frontend:
cd frontend
npm install
npm run dev -- --port 8080
Run database migrations
cd app
alembic upgrade head
[Optional] If you have your own hosted PostgresSQL database, please make sure to update the sqlalchemy.url in the alembic.ini file.
Settings
Before using the application, navigate to the Settings page by clicking on the left-side dashboard button, and configure the settings.
Ready!
You are all set!
Overview
Here's a quick overview of the project
Ingestion Flow
Query Flow
In progress for the community - by Krishnasis ๐จ๐ฝโ๐ป
Powered by FastAPI ๐