paperwork-backend
paperwork-backend copied to clipboard
Backend part of Paperwork (Python API, no UI)
This repository is obsolete
Source code of the backend is now in the same Git repository than the frontend
Description
Paperwork is a GUI to make papers searchable.
This is the backend part of Paperwork. It manages:
- The work directory / Access to the documents
- Indexing
- Searching
- Suggestions
- Import
- Export
There is no GUI here. The GUI is https://github.com/openpaperwork/paperwork .
Regarding the name "Paperwork", it can refer to both the GUI or the backend. If you want to be specific, you can call the gui "paperwork-gui" instead of just Paperwork.
Dependencies
- Pillow: Image manipulation (with JPEG support)
- Whoosh: To index and search documents, and provide keyword suggestions
- Libpoppler (PDF support)
- Cairo
- Gobject Introspection
Usage
You can find some examples in scripts/. You can also look at the code of Paperwork for reference.
Here are some snippets:
import paperwork_backend.config as config
import paperwork_backend.docsearch as docsearch
pconfig = config.PaperworkConfig()
pconfig.read()
print ("Opening docs ({})".format(pconfig.settings['workdir'].value))
# Instantiating a DocSearch object will open the indexes and the label
# bayesian filter caches. It may take a few seconds
docsearch = docsearch.DocSearch(pconfig.settings['workdir'].value)
suggestions = docsearch.find_suggestions(u"flesh")
print ("Keyword suggestions: {}".format(suggestions))
# [u'cles', u'flesc', u'flesch', u'jflesch', u'les']
documents = docsearch.find_documents(u"flesch")
print ("Nb document found: {}".format(len(documents))
# 1064
doc = documents[0]
print ("Nb pages of the first doc: {}".format(doc.nb_pages))
# 2
page = doc.pages[0]
print ("First page content:\n{}".format(page.text))
# [u'Salaires - D\xe9clarant 1',
# u'PPE - temps plein - D\xe9clarant 1',
# (...)
# u'/PZwpNYBAIPdsSiwBRqb0NXv/7bBPLHFI1JTvg==']
print ("Page size: {}".format(page.size))
# (1190, 1682)
print ("Page PIL Image object: {}".format(page.img))
# <PIL.Image.Image image mode=RGB size=1190x1682 at 0x7F4A561FA8C0>
Contact/Help
Developement is strongly related to Paperwork-gui.
Contact
Licence
GPLv3 or later. See LICENSE.
Development
Developement is strongly related to Paperwork-gui. All the information can be found on the wiki