splink icon indicating copy to clipboard operation
splink copied to clipboard

[FEAT] Additional argument to filter comparisons shown in comparison viewer dashboard

Open samnlindsay opened this issue 6 months ago • 0 comments

Is your proposal related to a problem?

The comparison viewer dashboard shows num_example_rows examples for every comparison vector present in df_predict. For sufficiently large datasets and complex models, the number of comparison vectors can become prohibitively large (I have an example where the dashboard is 1.4 GB with num_example_rows=2.

Currently, the only way to trim this down is to manipulate df_predict. This can easily be done if you want to view comparisons with a match probability between say 0.5 and 0.999, but would be more difficult to show only comparison vectors that appear >N times. Either or both of these options would be helpful to include in the dashboard function.

Describe the solution you'd like

A min_count argument so min_count=100 is one way to keep to a more manageable file size.

comparison_viewer_dashboard(
    df_predict, 
    out_path, 
    overwrite=False, 
    num_example_rows=2, 
    return_html_as_string=False,
    min_count=1
)

samnlindsay avatar Aug 09 '24 11:08 samnlindsay