LMdiff icon indicating copy to clipboard operation
LMdiff copied to clipboard

Mutliple copies of results in search

Open HendrikStrobelt opened this issue 4 years ago • 3 comments

When querying the API for text snippets from pre-computed corpus, some snippets are duplicates which violates the uniqueness requirement for the list.

HendrikStrobelt avatar Jun 23 '21 02:06 HendrikStrobelt

I can remove duplicates from the deployed dataset, but we should have error handling for this since users will be able to upload their own datasets and they may have duplicate phrases. What are possible solutions?

bhoov avatar Jun 23 '21 13:06 bhoov

Should I send a “key” field along with each item in the dataset?

bhoov avatar Jun 23 '21 13:06 bhoov

This is best handled by the API endpoint that returns the list. Find 1.5 * k number of interesting examples, remove duplicates, return :k of them

bhoov avatar Jun 23 '21 20:06 bhoov