Feature request : prevent duplicate entries
Hi :) could you add an entry to prevent / search for duplicate entries ?
Hi, de-duplication of text is a tricky thing once you're past binary equal values. For example case insensitive detection can be a nightmare in non latin characters. (Lang specific code / collation has to be selected by the user / ä,ü,ö have at least two binary representations in UTF ) What exactly were you searching for, de-duplication on import or in the editor itself ?
De-duplication in the editor. When I add a word I have to check whether it already exists in a list. I didn't checked out the code however if words are stored in a String, isn't it feasible to use java String.compare method to achieve such a thing ?
I wouldn't check this via java's comparator as this'll get really slow, but yeah, that's the easiest way to do
I've thought about some GUI changes and DB functions to implement this, but I've not come to good terms for this. There are several reasons for this: Depending on your vocable set you have multiple entries with the same column A entry but different column B entries (or vice versa). ( We could start discussing a 1:N or N:N Database relation of vocables as better structure instead of 1:1 but this would end up in CSV incompatibilities for simple sets.)
A | B
Fahrstuhl | lift
Fahrstuhl | elevator
Fahrstuhl | lift cage
For these people you want duplicate searching for exact A:B matches. Some people may also want to know when the same Column A/B entry already exists, because they have a unique A and unique B key. (or only A..). The complexity can be getting pretty high. So you already end up with three options: Check A&B together / check for duplicates in A/B exclusive. Next problem is, how do you want to tell the user that there are duplicates ? After he has typed them inside ? A list of search results directly under the input field ? When he starts typing or afterwards ? Also: Searching over the complete set is pretty cpu intense, at least when it's done while typing.
TLDR: There are many options and I could just pick out one of it and ignore everything else, but I'm not the end user here. So I want some feedback on this before I'm going to re-change the DB, making it backwards incompatible.
Indeed it's tricky. Currently when I read a new word, I search for the translation in my language and write it with all the possible translations.
Krankenversicherung | assurance maladie
Here the most important thing (to me) is to not write the german term two times (if I don't remember whether I already wrote it). But if there is many translations I write them as follows
sanf | léger, doux
Of course it would be perfect (for the end user) if it was like pons.de : each term is linked with many other terms :
sanft > léger, doux doux > sanft, weich, suß
but certainly hard to implement.
Wait for other feedbacks; for me a warning telling the user if the left word already exists in the list (whatever is in the right column) would be a huge improvement :-)
Of course it would be perfect (for the end user) if it was like pons.de : each term is linked with many other terms
this is the N:N / 1:N relation problem I've talked about, possible but kinda overkill, leaving problem with CSV import/export etc
but tbh I'm already thinking about ways to rewrite all the crucial parts to support this, though this is really not a priority
Wait for other feedbacks; for me a warning telling the user if the left word already exists in the list (whatever is in the right column) would be a huge improvement :-)
VA is CSV import/export compatible on purpose: You can write your stuff in libreoffice/excel and import it, this way you have the full search functionality of your office program & much more computing power > better response times.