Minimum number of characters for notes
To improve the quality of notes, could we add a "text needs to be at least 10 characters" type of thing?
With this implemented, people would no longer be able to upload uninformative, short notes, but instead be encouraged to provide context. An example would be a note with “Here is housenumber 1” instead of “1”. The error message for text that is too short could/should include a brief message explaining that notes with longer text are more helpful/informative for other mappers.
This is of course not a fix for all issues around the quality of some notes, but it could be a step in the right direction.
This reminds me of the discussion in openstreetmap/iD#5091 about short changeset comments. It might have a bigger impact for notes because we allow anonymous users to leave notes; there’s no other way to follow up with someone about leaving too many short notes.
We currently set a maximum length but not a minimum length:
https://github.com/openstreetmap/openstreetmap-website/blob/3bca685f4a10c291ea347196bbef08992d41b457/app/views/notes/new.html.erb#L25
I think that could be beneficial to some degree. My only input would be to have a blacklist of sorts for specific note texts. Example, "test", "target", "spam", or "#####" where #### is a bad word. They get uploaded to OSM database, and while they get removed/hidden by DWG, they are still often visible on other services/sites after being removed, and are visible before being removed by DWG.
Seriously, no. Hard coded blacklists are a hopeless solution.
I agree with Tom here. That would also be in the domain of the DWG and is out of scope for this issue.
Understood. Thanks for the explanation.
At most, keywords can serve as a warning somewhere, similar to "suspect word" in OSMCha. Maybe OSM Note applications are already doing this.
The only attribute that currently has a minimum length (other than 0 or 1) is user.display_name, which has a minimum length of 3.
We need to be very careful when picking minimum lengths, since some languages (e.g. Chinese) have a lot of information in each character. I have limited experience with reading CJK but I wouldn't be surprised if you can fit multiple complete sentences in less than 10 characters! So having that high a minimum might lead to contributors having to fill their notes with extra unnecessary characters.
If anyone has more direct experience with CJK note lengths, or wants to analyse existing notes, it would be good to have more details.
I suspect that length is codepoints rather than glyphs but for CJK I think the two are generally the same so it probably doesn't make any difference.
Not perfect but: byte length?
Not perfect but: byte length?
Interesting and technically sound idea, but it would be a difficult concept to explain to users.
Indeed. I would avoid the technical definition and say something along the lines of "plz say more"