LMdiff
LMdiff copied to clipboard
Mutliple copies of results in search
When querying the API for text snippets from pre-computed corpus, some snippets are duplicates which violates the uniqueness requirement for the list.
I can remove duplicates from the deployed dataset, but we should have error handling for this since users will be able to upload their own datasets and they may have duplicate phrases. What are possible solutions?
Should I send a “key” field along with each item in the dataset?
This is best handled by the API endpoint that returns the list. Find 1.5 * k number of interesting examples, remove duplicates, return :k of them