akshara-project
akshara-project copied to clipboard
Improve and document ingestion workflow
It should be easy for anyone in the team to add new documents to elasticsearch, once we have the raw docs.
- schema/format to follow (for all our sources: crawled docs, OCR)
- where to store the raw docs
- how to start indexing the docs to elasticsearch
- how to verify results of indexing
- rollback to the previous state in case of issues
Partially done in https://github.com/Code4Nepal/akshara-project/pull/80