eyecite icon indicating copy to clipboard operation
eyecite copied to clipboard

Improve benchmark

Open grossir opened this issue 1 year ago • 1 comments

First, the readability can be improved. Branches are named "branch1", "branch2"; but which is "main" and which is the PR?

Second, we could return some statistics already implied in the processing In this recent PR, there was a noticeable processing time increment, which is an important metric for the changes

Also, with the introduction of ReferenceCitations we get the possibility of overlapping citations; it would be interesting to return the citation type, besides the citation type, in the JSON itself

About processing time calculation, from the results JSON themselves

url1 = "https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/203/results/8981703e7cc27067adcb39f66346dc62248974cf.json"
url2 = "https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/203/results/bb9ca00f64c5aa47d0eb85a16e38bc03a6bf0b61.json"

def get_time_stats(url):
    import requests
    import statistics
    
    jason = requests.get(url).json()
    prev_start = jason[0]['time']
    actual_times = [prev_start]
    for item in jason[1:]:
        actual_times.append(item['time'] - prev_start)
        prev_start =  item['time']
    
    print("Mean: ", sum(actual_times)/len(actual_times))
    print("Median: ", statistics.median(actual_times))
    print("Sample size: ", len(actual_times))

get_time_stats(url1)
get_time_stats(url2)

Yields

Mean:  0.08226925288831836
Median:  0.014779999999994686
Sample size:  779

Mean:  0.05099591142490372
Median:  0.009577000000000169
Sample size:  779

grossir avatar Feb 14 '25 16:02 grossir

Another improvement: benchmark annotation performance

On a couple of recent PRs, the peformance footprint of changes (new cleaning step, new span updater) was invisible when they were on the annotation step; but became evident when moved to the find step

grossir avatar May 21 '25 16:05 grossir