Flym
Flym copied to clipboard
Filter double items
Would it be possible to extend the filter for double items a bit. At the moment it only deals with titles that are exactly the same. But sometimes they are very similar (but not identical). Could you add a filter for these situations as well? Here are two examples: https://ibb.co/tQGkCgt
I agree it would be really useful. Do you have an idea on how to write this filter? It is dangerous if the filter is not strict enough.
Off the top of my head, I can find two ways to do it:
- Remove item whose name is included in the name of another item name. Could work for edits, but I think there will be a lot of false positives. Especially if someone publishes items with short names.
- Use the Levenshtein distance (edit distance). For example, remove a feed if the Levenshtein distance to another feed is smaller than 10% of the length of the title.
Or maybe combine both ?
Well, you're probably asking too much for me to be able to answer. :-) I don't know. Perhaps filtering based on a similar word count?
A minimum number of x identical words (x is determined by the user) would be seen as identical feeds. Perhaps combined with an exclusion for (too) short words (a, it, the...).
It appears that the even the filter as it is now doesn't work properly. Take a look at these two screenshots. One with a double\identical feed and one with three identical feeds(!). https://ibb.co/RhMZJVD