OpenAdapt
OpenAdapt copied to clipboard
Add Process Graph Extraction From Recording Using `pm4py`
What kind of change does this PR introduce?
Summary
Checklist
- [ ] My code follows the style guidelines of OpenAdapt
- [ ] I have performed a self-review of my code
- [ ] If applicable, I have added tests to prove my fix is functional/effective
- [ ] I have linted my code locally prior to submission
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
- [ ] New and existing unit tests pass locally with my changes
How can your code be run and tested?
Other information
One thing we will need to take care of is:
We will need to edit install script to handle the installation of Graphviz if this PR gets merged to main.
Link: https://pm4py.fit.fraunhofer.de/static/assets/api/2.7.11/install.html#pip
We can add a new issue for above if this PR gets merged
@KrishPatel13 it's not clear to me that Graphviz is necessary, see e.g. https://github.com/pm4py/pm4py-core/issues/425#issuecomment-1632520145
Output of commit: d7a2e6c441fa28944f52ef01a4f1b00e1d5815fc is below:
The data for above is here:
case_id;activity;timestamp;costs;resource
3;register request;2010-12-30 14:32:00+01:00;50;Pete
3;examine casually;2010-12-30 15:06:00+01:00;400;Mike
3;check ticket;2010-12-30 16:34:00+01:00;100;Ellen
3;decide;2011-01-06 09:18:00+01:00;200;Sara
3;reinitiate request;2011-01-06 12:18:00+01:00;200;Sara
3;examine thoroughly;2011-01-06 13:06:00+01:00;400;Sean
3;check ticket;2011-01-08 11:43:00+01:00;100;Pete
3;decide;2011-01-09 09:55:00+01:00;200;Sara
3;pay compensation;2011-01-15 10:45:00+01:00;200;Ellen
2;register request;2010-12-30 11:32:00+01:00;50;Mike
2;check ticket;2010-12-30 12:12:00+01:00;100;Mike
2;examine casually;2010-12-30 14:16:00+01:00;400;Sean
2;decide;2011-01-05 11:22:00+01:00;200;Sara
2;pay compensation;2011-01-08 12:05:00+01:00;200;Ellen
1;register request;2010-12-30 11:02:00+01:00;50;Pete
1;examine thoroughly;2010-12-31 10:06:00+01:00;400;Sue
1;check ticket;2011-01-05 15:12:00+01:00;100;Mike
1;decide;2011-01-06 11:18:00+01:00;200;Sara
1;reject request;2011-01-07 14:24:00+01:00;200;Pete
6;register request;2011-01-06 15:02:00+01:00;50;Mike
6;examine casually;2011-01-06 16:06:00+01:00;400;Ellen
6;check ticket;2011-01-07 16:22:00+01:00;100;Mike
6;decide;2011-01-07 16:52:00+01:00;200;Sara
6;pay compensation;2011-01-16 11:47:00+01:00;200;Mike
5;register request;2011-01-06 09:02:00+01:00;50;Ellen
5;examine casually;2011-01-07 10:16:00+01:00;400;Mike
5;check ticket;2011-01-08 11:22:00+01:00;100;Pete
5;decide;2011-01-10 13:28:00+01:00;200;Sara
5;reinitiate request;2011-01-11 16:18:00+01:00;200;Sara
5;check ticket;2011-01-14 14:33:00+01:00;100;Ellen
5;examine casually;2011-01-16 15:50:00+01:00;400;Mike
5;decide;2011-01-19 11:18:00+01:00;200;Sara
5;reinitiate request;2011-01-20 12:48:00+01:00;200;Sara
5;examine casually;2011-01-21 09:06:00+01:00;400;Sue
5;check ticket;2011-01-21 11:34:00+01:00;100;Pete
5;decide;2011-01-23 13:12:00+01:00;200;Sara
5;reject request;2011-01-24 14:56:00+01:00;200;Mike
4;register request;2011-01-06 15:02:00+01:00;50;Pete
4;check ticket;2011-01-07 12:06:00+01:00;100;Mike
4;examine thoroughly;2011-01-08 14:43:00+01:00;400;Sean
4;decide;2011-01-09 12:02:00+01:00;200;Sara
4;reject request;2011-01-12 15:44:00+01:00;200;Ellen
The following sqlite3 commands will run the query mentioned in process-query.sql (present in same directory where you have sqlite3 open), and redirect its output to a file called dataout.csv in the CWD .
sqlite> .headers on
sqlite> .mode csv
sqlite> .once dataout.csv
sqlite> .read process-query.sql
My process-query.sql file is this so far:
select r.id as case_id,
we.title as activity,
-- ae."timestamp" as timestamp,
datetime(ae."timestamp", 'unixepoch', 'localtime') AS "timestamp",
COALESCE(ae."timestamp" - LAG(ae."timestamp") OVER (ORDER BY ae."timestamp"), 0) as costs,
ae.name as resource
from recording r
inner join action_event ae on r."timestamp" = ae.recording_timestamp
inner join window_event we on r."timestamp" = we.recording_timestamp and we."timestamp" = ae.window_event_timestamp
where r.id = 1
order by r.id, ae."timestamp";
Once we have this dataout.csv we can read it using pandas and convert it to dataframe type and then make the process-graph using this piece of code:
import pm4py
import pandas
if __name__ == "__main__":
log = pandas.read_csv("dataout.csv", sep=",")
log = pm4py.format_dataframe(
log,
case_id="case_id",
activity_key="activity",
timestamp_key="timestamp",
timest_format="%Y-%m-%d %H:%M:%S",
)
dfg, start_activities, end_activities = pm4py.discover_dfg(log)
pm4py.view_dfg(dfg, start_activities, end_activities, format="html")
Output:
This is how the data looks like of my recording: https://github.com/OpenAdaptAI/OpenAdapt/pull/852/files#diff-03b12224302b60e98d6398edf88b09df24291a794a603ef435dd31b64bea8c8cR1
@abrichr What shall be our next step for this pr (or issue: https://github.com/OpenAdaptAI/OpenAdapt/issues/564), since I am able to produce a process-graph from our db, what shall be our next steps ?
I know we also want the action target as said here: https://github.com/OpenAdaptAI/OpenAdapt/issues/564#issuecomment-2214700881, could you please give a bit more info that how do you want to see (visualize) the action target in the process-graph.
As currently, it just shows the activity column only on the process graph as seen here: https://github.com/OpenAdaptAI/OpenAdapt/pull/852#issuecomment-2241253780.
So, I will need to do a bit more research on how to include the resource / other information in the process-graph and how to customize it according to our need.
I would love some general feedback and/or guidance for the work so far and next steps. Thank you.
Hi Krish, the activity should include the action target, as well as any other relevant information.
It means I can format an activity to be like:
" {title}~{action}~{action_target} "
and make this string under activity column. This way, we could have different information showacse in the process graph and also indexed.
I will give it a shot and I will let you its results.
@abrichr I have an update for this PR. I have tried to make a process-graph to showcase relevant details of each action event.
For that, could you see this output html page:
openadapt/process-graph/output/tmpkbsu9k18.html
OR below:
https://github.com/user-attachments/assets/3d593514-50a0-4ab8-ac4e-3338daa03a6d
I more change that I have to do in this is:
FOr press/prelease events, I need to pick action_tager from either (key_name OR key_char), whicheveyr is NOT Null
Use the events.py::get_events function with process_events=True in order to get keyframe events (ignore children).
Use ActionEvent.text to get action representation.
Use openadapt/strategies/visual.py::add_active_segment_descriptions to get targets.