Plugin architecture
We could support plugins for pre- and/or post-processing the document analysis functionality.
A plugin could be a subclass of a class like this:
class AnnifPlugin:
"""A plugin that tweaks Annif queries before and/or after they are executed"""
def process_analyze_query(query):
"""Preprocess an analyze query, tweaking the parameters before the query is executed"""
# default implementation is a no-op
return query
def postprocess_analyze_query(query, result):
"""Postprocess an analyze query and result, tweaking the result before responding to the client"""
# default implementation is a no-op
return query
For registering plugins, we could perhaps make use of PluginBase. Each plugin would be a separate Python project that registers itself to the Annif plugin system. Each Annif project could define a set of plugins to use. The plugins could be stacked/chained, so that the result of one plugin would be fed to the next one in the chain.
Plugins would be fed the raw result of Annif queries (with lots of candidate subject), before cutting down them into the requested size and/or applying score thresholds. This way the plugins have more candidate subjects to work with.
Ideas for plugins:
- Neural plugin that makes use of neural networks to determine whether a set of subjects is "good" (i.e. looks somewhat like existing subject sets it has been trained on) and tweak the set so that it becomes "better"
- Co-occurrence plugin that checks whether a set of subject contains pairs of subjects that frequently occur together, and increases their score
In the current architecture these could simply be separate backends. Backends can register themselves using annif.backend.register_backend so I'm not sure whether the plugin infrastructure is really needed, but of course it would make Annif more extensible.
but of course it would make Annif more extensible.
I have two systems where plugins are needed. One of them has a few mechanisms for plugins. This project is passing through an update, starting by getting a setup.py file, and making more use of vanilla features.
We are thinking about using simple entry points (other good doc from setuptools).
I believe PyTest and PyLint use this approach. Scrapy is a different beast, so they used a different approach to allow scrappers to easily extend scrapy, without the need to package a python utility with setuptools.