unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

performance profiler with visualization

Open dafnapension opened this issue 1 year ago • 10 comments

dafnapension avatar Oct 04 '24 18:10 dafnapension

E.g. profiling the run of card cards/cola.json (8096 instances in train, 455 in validation, and 1043 in test) in the non-eager (usual) mode, we get the following. Searching (ctrl-F on the browser) for "profiler_" filters out most lines, leaving the methods of the profiler (and a few more). Note that most of the runtime goes into printing (list(ms[stream_name]).

image

dafnapension avatar Oct 05 '24 11:10 dafnapension

And the same as above, but in eager mode. Most of the time goes to standardization, as expected, and loading counts for really loading all the instances, so it lasts longer:

image

dafnapension avatar Oct 05 '24 11:10 dafnapension

The same for the first part of examples/evaluate_a_judge_model_capabilities_on_arena_hard.py, which generates one stream - test of 39990 instances. First - the non-eager mode:

image

dafnapension avatar Oct 05 '24 11:10 dafnapension

And the above example in eager mode:

image

dafnapension avatar Oct 05 '24 11:10 dafnapension

That is very neat @dafnapension! can we create a mechanism to sum up the total time of few cards without the loading? how could we compare times? we need some way to measure the difference between branches

elronbandel avatar Oct 06 '24 07:10 elronbandel

Tried to suggest solutions to the important issues you raised, @elronbandel

dafnapension avatar Oct 06 '24 20:10 dafnapension

yes, @elronbandel , I am close to packing it into one python script, no shell script. coming soon.

dafnapension avatar Oct 08 '24 12:10 dafnapension

All in one python script, @elronbandel . I am not sure how to make it a GitHub action. I saw the other actions refer to the branch suggested in the PR as main. My python script compares the current branch (which I thought about as the new branch, suggested in the PR. E.g. performance_profiler in this very PR) against branch main.

dafnapension avatar Oct 08 '24 19:10 dafnapension

Also, which cards would you consider typical, representative for unitxt's users?
Those would make up the benchmark, and need be listed in cards=[..,..,..] in line ~140.

dafnapension avatar Oct 08 '24 19:10 dafnapension

Should be something in the spirit of this:

name: Test Performance

on:
  pull_request:
    branches:
      - main

jobs:
  run-performance:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout main branch
      uses: actions/checkout@v3
      with:
        ref: main

    - name: Run performance on main branch
      run: |
        python profile/card_profiler.py > main_score.txt

    - name: Save main performance result
      uses: actions/upload-artifact@v3
      with:
        name: main_score
        path: main_score.txt

    - name: Checkout PR branch
      uses: actions/checkout@v3
      with:
        ref: ${{ github.head_ref }}

    - name: Run performance on PR branch
      run: |
        python profile/card_profiler.py > pr_score.txt

    - name: Download main performance result
      uses: actions/download-artifact@v3
      with:
        name: main_score
        path: ./main_score.txt

    - name: Compare main and PR performance
      run: |
        echo "Comparing performance between main and PR"
        main_score=$(cat main_score.txt)
        pr_score=$(cat pr_score.txt)

        # Calculate percentage degradation
        if [ "$main_score" -eq 0 ]; then
          echo "Main score is 0, can't calculate degradation."
          exit 1
        fi

        degradation=$(echo "scale=2; 100 * ($main_score - $pr_score) / $main_score" | bc)

        echo "Main score: $main_score"
        echo "PR score: $pr_score"
        echo "Degradation: $degradation%"

        # Check if degradation is more than 5%
        if (( $(echo "$degradation > 5" | bc -l) )); then
          echo "Performance degradation exceeds 5%!"
          exit 1
        else
          echo "Performance is within acceptable limits."
        fi

elronbandel avatar Oct 09 '24 09:10 elronbandel