performance profiler with visualization
E.g. profiling the run of card cards/cola.json (8096 instances in train, 455 in validation, and 1043 in test) in the non-eager (usual) mode, we get the following. Searching (ctrl-F on the browser) for "profiler_" filters out most lines, leaving the methods of the profiler (and a few more). Note that most of the runtime goes into printing (list(ms[stream_name]).
And the same as above, but in eager mode. Most of the time goes to standardization, as expected, and loading counts for really loading all the instances, so it lasts longer:
The same for the first part of examples/evaluate_a_judge_model_capabilities_on_arena_hard.py, which generates one stream - test of 39990 instances. First - the non-eager mode:
And the above example in eager mode:
That is very neat @dafnapension! can we create a mechanism to sum up the total time of few cards without the loading? how could we compare times? we need some way to measure the difference between branches
Tried to suggest solutions to the important issues you raised, @elronbandel
yes, @elronbandel , I am close to packing it into one python script, no shell script. coming soon.
All in one python script, @elronbandel . I am not sure how to make it a GitHub action.
I saw the other actions refer to the branch suggested in the PR as main.
My python script compares the current branch (which I thought about as the new branch, suggested in the PR. E.g. performance_profiler in this very PR) against branch main.
Also, which cards would you consider typical, representative for unitxt's users?
Those would make up the benchmark, and need be listed in
cards=[..,..,..] in line ~140.
Should be something in the spirit of this:
name: Test Performance
on:
pull_request:
branches:
- main
jobs:
run-performance:
runs-on: ubuntu-latest
steps:
- name: Checkout main branch
uses: actions/checkout@v3
with:
ref: main
- name: Run performance on main branch
run: |
python profile/card_profiler.py > main_score.txt
- name: Save main performance result
uses: actions/upload-artifact@v3
with:
name: main_score
path: main_score.txt
- name: Checkout PR branch
uses: actions/checkout@v3
with:
ref: ${{ github.head_ref }}
- name: Run performance on PR branch
run: |
python profile/card_profiler.py > pr_score.txt
- name: Download main performance result
uses: actions/download-artifact@v3
with:
name: main_score
path: ./main_score.txt
- name: Compare main and PR performance
run: |
echo "Comparing performance between main and PR"
main_score=$(cat main_score.txt)
pr_score=$(cat pr_score.txt)
# Calculate percentage degradation
if [ "$main_score" -eq 0 ]; then
echo "Main score is 0, can't calculate degradation."
exit 1
fi
degradation=$(echo "scale=2; 100 * ($main_score - $pr_score) / $main_score" | bc)
echo "Main score: $main_score"
echo "PR score: $pr_score"
echo "Degradation: $degradation%"
# Check if degradation is more than 5%
if (( $(echo "$degradation > 5" | bc -l) )); then
echo "Performance degradation exceeds 5%!"
exit 1
else
echo "Performance is within acceptable limits."
fi