OpenAdapt icon indicating copy to clipboard operation
OpenAdapt copied to clipboard

Add Process Graph Extraction From Recording Using `pm4py`

Open KrishPatel13 opened this issue 1 year ago • 12 comments

What kind of change does this PR introduce?

Summary

Checklist

  • [ ] My code follows the style guidelines of OpenAdapt
  • [ ] I have performed a self-review of my code
  • [ ] If applicable, I have added tests to prove my fix is functional/effective
  • [ ] I have linted my code locally prior to submission
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
  • [ ] New and existing unit tests pass locally with my changes

How can your code be run and tested?

Other information

KrishPatel13 avatar Jul 14 '24 16:07 KrishPatel13

One thing we will need to take care of is:

image

We will need to edit install script to handle the installation of Graphviz if this PR gets merged to main.

Link: https://pm4py.fit.fraunhofer.de/static/assets/api/2.7.11/install.html#pip

We can add a new issue for above if this PR gets merged

KrishPatel13 avatar Jul 14 '24 16:07 KrishPatel13

@KrishPatel13 it's not clear to me that Graphviz is necessary, see e.g. https://github.com/pm4py/pm4py-core/issues/425#issuecomment-1632520145

abrichr avatar Jul 15 '24 15:07 abrichr

Output of commit: d7a2e6c441fa28944f52ef01a4f1b00e1d5815fc is below: image

The data for above is here:

case_id;activity;timestamp;costs;resource
3;register request;2010-12-30 14:32:00+01:00;50;Pete
3;examine casually;2010-12-30 15:06:00+01:00;400;Mike
3;check ticket;2010-12-30 16:34:00+01:00;100;Ellen
3;decide;2011-01-06 09:18:00+01:00;200;Sara
3;reinitiate request;2011-01-06 12:18:00+01:00;200;Sara
3;examine thoroughly;2011-01-06 13:06:00+01:00;400;Sean
3;check ticket;2011-01-08 11:43:00+01:00;100;Pete
3;decide;2011-01-09 09:55:00+01:00;200;Sara
3;pay compensation;2011-01-15 10:45:00+01:00;200;Ellen
2;register request;2010-12-30 11:32:00+01:00;50;Mike
2;check ticket;2010-12-30 12:12:00+01:00;100;Mike
2;examine casually;2010-12-30 14:16:00+01:00;400;Sean
2;decide;2011-01-05 11:22:00+01:00;200;Sara
2;pay compensation;2011-01-08 12:05:00+01:00;200;Ellen
1;register request;2010-12-30 11:02:00+01:00;50;Pete
1;examine thoroughly;2010-12-31 10:06:00+01:00;400;Sue
1;check ticket;2011-01-05 15:12:00+01:00;100;Mike
1;decide;2011-01-06 11:18:00+01:00;200;Sara
1;reject request;2011-01-07 14:24:00+01:00;200;Pete
6;register request;2011-01-06 15:02:00+01:00;50;Mike
6;examine casually;2011-01-06 16:06:00+01:00;400;Ellen
6;check ticket;2011-01-07 16:22:00+01:00;100;Mike
6;decide;2011-01-07 16:52:00+01:00;200;Sara
6;pay compensation;2011-01-16 11:47:00+01:00;200;Mike
5;register request;2011-01-06 09:02:00+01:00;50;Ellen
5;examine casually;2011-01-07 10:16:00+01:00;400;Mike
5;check ticket;2011-01-08 11:22:00+01:00;100;Pete
5;decide;2011-01-10 13:28:00+01:00;200;Sara
5;reinitiate request;2011-01-11 16:18:00+01:00;200;Sara
5;check ticket;2011-01-14 14:33:00+01:00;100;Ellen
5;examine casually;2011-01-16 15:50:00+01:00;400;Mike
5;decide;2011-01-19 11:18:00+01:00;200;Sara
5;reinitiate request;2011-01-20 12:48:00+01:00;200;Sara
5;examine casually;2011-01-21 09:06:00+01:00;400;Sue
5;check ticket;2011-01-21 11:34:00+01:00;100;Pete
5;decide;2011-01-23 13:12:00+01:00;200;Sara
5;reject request;2011-01-24 14:56:00+01:00;200;Mike
4;register request;2011-01-06 15:02:00+01:00;50;Pete
4;check ticket;2011-01-07 12:06:00+01:00;100;Mike
4;examine thoroughly;2011-01-08 14:43:00+01:00;400;Sean
4;decide;2011-01-09 12:02:00+01:00;200;Sara
4;reject request;2011-01-12 15:44:00+01:00;200;Ellen

KrishPatel13 avatar Jul 20 '24 18:07 KrishPatel13

The following sqlite3 commands will run the query mentioned in process-query.sql (present in same directory where you have sqlite3 open), and redirect its output to a file called dataout.csv in the CWD .

sqlite> .headers on
sqlite> .mode csv
sqlite> .once dataout.csv
sqlite> .read process-query.sql

My process-query.sql file is this so far:

select r.id as case_id, 
	we.title as activity, 
    -- ae."timestamp" as timestamp,
    datetime(ae."timestamp", 'unixepoch', 'localtime') AS "timestamp",
	COALESCE(ae."timestamp" - LAG(ae."timestamp") OVER (ORDER BY ae."timestamp"), 0) as costs,
	ae.name	as resource
from recording r
inner join action_event ae on r."timestamp" = ae.recording_timestamp 
inner join window_event we on r."timestamp" = we.recording_timestamp and we."timestamp" = ae.window_event_timestamp
where r.id = 1
order by r.id, ae."timestamp";

image

Once we have this dataout.csv we can read it using pandas and convert it to dataframe type and then make the process-graph using this piece of code:

import pm4py
import pandas

if __name__ == "__main__":
    log = pandas.read_csv("dataout.csv", sep=",")
    log = pm4py.format_dataframe(
        log,
        case_id="case_id",
        activity_key="activity",
        timestamp_key="timestamp",
        timest_format="%Y-%m-%d %H:%M:%S",
    )

    dfg, start_activities, end_activities = pm4py.discover_dfg(log)
    pm4py.view_dfg(dfg, start_activities, end_activities, format="html")

Output: image

This is how the data looks like of my recording: https://github.com/OpenAdaptAI/OpenAdapt/pull/852/files#diff-03b12224302b60e98d6398edf88b09df24291a794a603ef435dd31b64bea8c8cR1

KrishPatel13 avatar Jul 20 '24 19:07 KrishPatel13

@abrichr What shall be our next step for this pr (or issue: https://github.com/OpenAdaptAI/OpenAdapt/issues/564), since I am able to produce a process-graph from our db, what shall be our next steps ?

KrishPatel13 avatar Jul 20 '24 19:07 KrishPatel13

I know we also want the action target as said here: https://github.com/OpenAdaptAI/OpenAdapt/issues/564#issuecomment-2214700881, could you please give a bit more info that how do you want to see (visualize) the action target in the process-graph.

As currently, it just shows the activity column only on the process graph as seen here: https://github.com/OpenAdaptAI/OpenAdapt/pull/852#issuecomment-2241253780.

So, I will need to do a bit more research on how to include the resource / other information in the process-graph and how to customize it according to our need.

I would love some general feedback and/or guidance for the work so far and next steps. Thank you.

KrishPatel13 avatar Jul 20 '24 19:07 KrishPatel13

Hi Krish, the activity should include the action target, as well as any other relevant information.

abrichr avatar Jul 20 '24 19:07 abrichr

It means I can format an activity to be like:

" {title}~{action}~{action_target} "

and make this string under activity column. This way, we could have different information showacse in the process graph and also indexed.

KrishPatel13 avatar Jul 20 '24 19:07 KrishPatel13

I will give it a shot and I will let you its results.

KrishPatel13 avatar Jul 20 '24 19:07 KrishPatel13

@abrichr I have an update for this PR. I have tried to make a process-graph to showcase relevant details of each action event.

For that, could you see this output html page:

openadapt/process-graph/output/tmpkbsu9k18.html

OR below:

https://github.com/user-attachments/assets/3d593514-50a0-4ab8-ac4e-3338daa03a6d

KrishPatel13 avatar Jul 21 '24 02:07 KrishPatel13

I more change that I have to do in this is:

image

FOr press/prelease events, I need to pick action_tager from either (key_name OR key_char), whicheveyr is NOT Null

KrishPatel13 avatar Jul 21 '24 15:07 KrishPatel13

Use the events.py::get_events function with process_events=True in order to get keyframe events (ignore children).

Use ActionEvent.text to get action representation.

Use openadapt/strategies/visual.py::add_active_segment_descriptions to get targets.

abrichr avatar Jul 21 '24 16:07 abrichr