docspell icon indicating copy to clipboard operation
docspell copied to clipboard

Workflow support

Open madduck opened this issue 2 years ago • 0 comments

Introduction & why

At time of writing this, Docspell supports only one very specific workflow called "metadata confirmation".

image

A full-scaled workflow engine is beyond the scope of the project, and with Webhooks and the REST API, a tool like Windmill may just be perfectly well suited as a home for whatever complexity.

Nevertheless, I am going to try to argue that Docspell could benefit a lot from some basic workflow support, because it can enforce discipline, thus the quality of information, and reduce human error.

In almost all cases, it is possible to mock workflows by (ab)using tags. The problem with such an approach is that it's discretionary, not mandatory. (Ab)using tags means that workflow states and transitions can be represented, but conditions cannot be enforced. For instance, if a tag paid indicates a paid invoice, then this tag cannot be made dependent on the presence of a tag approved prior. Only if all users adhere to the same discipline and make no mistakes, then will a tag-based workflow actually work.

Requirements & basics

I think a lot of basic workflow requirements aren't part of a complicated workflow engine, but rather sprinkled across the entire system:

  1. Having cryptographic claims (see #2271) would already go a long way, as claims are a lot more meaningful than tags, especially since they prevent abuse and reduce the possibility of human error;
  2. Making custom fields required (#2262) would ensure that certain data cannot be omitted;
  3. Providing a field type of Users (#2267) enables assigning documents to the next person in charge.
  4. Being able to attach conditions to tags might be a good addition, though it supports the (ab)use of tags, which are semantic, not organisational. Document states would be better, as a system apart from tags.

A workflow is a finite-state machine, i.e. a collection of states and transitions.

States can hold data and one way to store them would be with a n:m table linking workflow to document (the "states table"), with a third column representing the current state, and a fourth column to hold workflow-instance-specific data for the current state, e.g. in a JSON dict.

Transitions move a document from one state to another, where the set of possible transitions is limited by a condition. A transition takes input and optionally generates output. The input becomes part of the state.

To keep things simple for now, workflows can be hardcoded, though see below for the UI requirements, which aren't that hard, I don't think.

Design by example: an approval workflow

Let's see what developing workflow support for Docspell might entail, by following a simple example: the approval workflow:

State Input Condition Next state Output
unassigned User assignment undecided
undecided Approval IsAssignedUser approved
undecided Rejection IsAssignedUser rejected
undecided Unassignment unassigned

Starting a workflow on a document puts the document in the workflow's initial state, and it happens following an event, if the starting condition is met. This condition could be expressed like a search expression, and the workflow started if the query matched the new document. In addition, a regular task could/should run to start workflows on any documents matching the search query, for which the workflow has not been started.

In the case of the metadata approval workflow, the condition is "all documents that don't have their metadata approved" and the workflow is started whenever a new document enters the system. An approval workflow could be started whenver the metadata are confirmed, and if the document is tagged invoice, for example.

Backend support

The backend side of things is pretty straight-forward, and in addition to search support ("give me all documents in the undecided state with myself assigned as the approving user"), I think four API endpoints would do:

  1. workflow/start, which creates an instance of the workflow by adding a row to the states table with the initial state;
  2. workflow/get_current_state, which returns the current state, along with the associated JSON dictionary of the workflow instance;
  3. workflow/get_transitions, which returns the set of possible transitions in the current context;
  4. workflow/transition, which takes whatever the input is, and returns the new state if the transition was possible and successful, or a failure code. In the presence of cryptographic claims (#2271), each such transition would be a claim.

Frontend considerations

Let's turn to the UI side of things: With search integration, it would be trivial to create a new tile listing all documents requiring the user's attention.

Handling states and transitions is a bit harder, but in essence, the UI should be able to derive the required input elements from the target state of each of the possible transitions, if done right. A simple transition to another state that does not requiring any input just gets a button. When a state needs additional information attached to it, such as undecided in the above, then the form builder renders the appropriate UI element, along with a button.

Here is what it could look like when in the unassigned state, there is only one transition requiring a user as input, so the UI would render a drop-down with the users, and a button:

image

Once assigned, the user's UI would expose the three possible transitions matching the condition (while all other users would only see the Unassign button). Since no input is required, the form builder just uses buttons:

image

In a terminal state (i.e. one without outgoing transitions), just the history would be shown.

Other use-cases

To simplify the UI, the same approach could be used for the metadata confirmation:

State Input Condition Next state Output
new Confirm metadata confirmed
confirmed Un-confirm metadata new

And even ASNs (see #924) could be done with workflows, e.g.:

State Input Condition Next state Output
unarchived ASN assignment archived ASN
unarchived Destruction purged

Workflow configuration UI

Workflows would be properties of a collective, I assume, and the UI would be similar to "Manage Data", i.e. essentially a way to create workflows, associated states, and transitions between them.

A workflow has a name, description, start condition (search expression), exactly one initial state, and any number of additional states.

Each state has a name, description, UI template ("Caroline has requested approval of this document"), and a set of outgoing transitions.

Each transition has a name, description, UI template, a condition, and a target state.

Specifying conditions

This is all straight-forward, except for the condition, where a balance between flexibility and usability has to be struck. Yet, there is no real way around exposing the workflow instance object and allowing the user to write code. Mayan uses the Django Template Language and as long instantiating the template code yields a non-zero result, the condition is considered to have been met. For instance, the above condition IsAssignedUser would look like this:

{{ workflow_instance.current_state.data['assigned_user'] == docspell.authenticated_user }}

There are obviously other ways to approach this, but I find this approach to be weirdly elegant despite being a hack.

Actions

Transitions should generate events so webhooks can react to them. In addition, it would be good if actions could be attached to transitions, such as:

  1. Adding/removing tags
  2. Modifying metadata in general
  3. Move the document into or out of a folder
  4. Send a notification to a user
  5. Start another workflow
  6. For later: add/remove permissions to a document

madduck avatar Sep 10 '23 10:09 madduck