OpenAdapt
OpenAdapt copied to clipboard
Add Replay ("Policy") performance tests (TaskCompletionRateTest)
Feature request
We need to extend https://github.com/OpenAdaptAI/OpenAdapt/issues/314 to include some useful tests and generate an automated report.
This involves:
- Create recordings of three tasks:
- Open a calculator and perform a short calculation
- Open a spreadsheet (e.g. https://github.com/OpenAdaptAI/OpenAdapt/blob/cb70f35985eeb579fd3e13b20a9839b10729921d/tests/assets/excel.png), open a time tracking app (e.g. https://clockify.me), copy a week's worth of data from the spreadsheet into the app, and save/submit the data in the app. (e.g. https://www.youtube.com/watch?v=omP11q-o_0I) Alternatively if browser events are not yet available (see https://github.com/OpenAdaptAI/OpenAdapt/pull/744), replicate something similar with two different spreadsheets open simultaneously (one for reading, one for writing).
- Open powerpoint and create a short presentation.
- Save them as fixtures
- Add automated tests to run a replay (with configurable strategy, defaulting to
VanillaReplayStrategy) and evaluate the outcome. Outcome evaluation can be implemented withWindowEventdata. - Add a script to log the outcome results to stdout and/or to a file.
Motivation
Scientific rigor and reproducibility.
@seanmcguire12 your assistance would be greatly appreciated!
@KrishPatel13 outcome evaluation for web apps will depend on finishing https://github.com/OpenAdaptAI/OpenAdapt/pull/364
Save a fixture with recording.task_description = "test: calculate 2x3" that is just like the video currently on the website.
Test 1: Run the VanillaReplayStrategy with empty instructions (or give it instructions like replay the recording verbatim). Use openadapt.window to assert that the calculator display area contains the expected value 6.
Test 2: Run the VanillaReplayStrategy with instructions like calculate 9-8+7. Use the same API to assert that the calculator display area contains the expected value 8.
Parameterize the replay strategy and iterate over all of them. Produce a report with the results.
@seanmcguire12 please submit a PR with your work-in-progress 🙏