LMdiff Mutliple copies of results in search

Mutliple copies of results in search

Open HendrikStrobelt opened this issue 4 years ago • 3 comments

When querying the API for text snippets from pre-computed corpus, some snippets are duplicates which violates the uniqueness requirement for the list.

Jun 23 '21 02:06 HendrikStrobelt

I can remove duplicates from the deployed dataset, but we should have error handling for this since users will be able to upload their own datasets and they may have duplicate phrases. What are possible solutions?

Jun 23 '21 13:06 bhoov

Should I send a “key” field along with each item in the dataset?

Jun 23 '21 13:06 bhoov

This is best handled by the API endpoint that returns the list. Find 1.5 * k number of interesting examples, remove duplicates, return :k of them

Jun 23 '21 20:06 bhoov

LMdiff LMdiff copied to clipboard

Mutliple copies of results in search

LMdiff
LMdiff copied to clipboard