caluma icon indicating copy to clipboard operation
caluma copied to clipboard

Idea/Discussion: Analytics V2

Open winged opened this issue 1 year ago • 0 comments

Current Situation

The analytics functionality is currently built directly into Caluma. This allows for tight integration and reusability of the Django models.

However, it also overloads Caluma itself with analytics processing, which may slow down the "transactional" processing (OLAP vs OLTP).

Users also want other forms of output, like CSV, Excel, or even graphics. Integrating this would blow up the "core" Caluma even more, both in terms of (disk) volume and workload.

The goal

In the long term, we will need to extract the analytics functionality into it's own service. The (live/production) data would be periodically synchronized and transformed into a structure that is better suited for analytics processing (Snowflake / Star schema)

Caluma Analytics currently tries to "tabularize" the tree structure by providing specific selections when a "parent" object has multiple sub-objects. For example, when starting with cases, we have: Case -> Workitem[task-x,newest] -> Document -> Answer[question-x] -> (possible value extraction).

When building the new analytics service, this "pre-aggregation" could be done on DB level, so the query complexity would be reduced drastically. Taking the above example, the Caluma schema would be structured into the following tables:

image

As an explanation, the work item's primary key in Caluma is its UUID. In the Analytics service, its primary key would be the combination of case id, task slug, and an additional selector to reduce the number to zero or one per case (like "newest" or "oldest"); allowing a "tabular" reading of the tree structure.

This would imply having custom visibility and permissions in the Analytics service, as the data model would not match the one used in Caluma itself. However we think this is not neccessarily a bad thing, as the requirements may differ anyway, and users may be allowed to see things in aggregates that they wouldn't be allowed to see in detail.

winged avatar Jun 29 '23 08:06 winged