openstreetmap-website icon indicating copy to clipboard operation
openstreetmap-website copied to clipboard

Designate regions where new accounts cannot edit the map or add notes

Open pablobm opened this issue 2 months ago • 33 comments

Problem

Common sources of vandalism and spam include anonymous users or those with recently-created accounts.

Specifically regarding notes, the debate is still open as to whether anonymous accounts should be able to create notes at all. See https://community.openstreetmap.org/t/we-dont-need-anonymous-notes/105335

Description

Implement the ability to designate regions where new accounts and anonymous users cannot make map edits or create notes.

Detail

  • New section on the website, accessible by admins and moderators.
  • When a note/edit is created within the designated area, by an anonymous or recent (TBD) user, the creation is denied and the user is shown a descriptive error message.
  • No limit on comments for now. This appears to be less of an issue, and there are different types of comments that would need to be considered.

Open questions

  • [ ] How do we define a "recent" user?
  • [ ] What should be the interface to designate the area: rectangle, circle? What is available with the current tools?

Edits:

  • 2025-11-27 De-scoped option to limit comments.

pablobm avatar Nov 10 '25 15:11 pablobm

That seems quite drastic - who exactly needs to sign off on the ability to do that and who gets to actually decide what regions are blocked?

tomhughes avatar Nov 10 '25 16:11 tomhughes

Not sure how the decision process works, I'm assuming that the DWG would be the ones to sign off on these decisions. The idea itself was raised by a member of the DWG and the example they brought up was of politically-motivated vandalism.

pablobm avatar Nov 11 '25 11:11 pablobm

For one tiny part of this - I would imagine the best way to designate an area is with a polygon - I think squares or circles would be too limiting for moderators and they would end up creating multiple adjacent regions to fit a real-world situation. If we go down this route, and use polygons, I think it would therefore be best built on top of PostGIS rather than rolling our own polygon storage and intersection testing.

Another tiny part - defining whether a changeset intersects a given polygon is not well defined. One example is when the entities are far apart - e.g. if a changeset contains a node in London and one in Australia, would that be blocked by a zone over the Middle East? It's the same problem we've had for years with the History view.

gravitystorm avatar Nov 20 '25 16:11 gravitystorm

I agree that polygon would be great, but I'm worried that implementing it could turn into a rabbit hole. There's building the UI to create the selection, and having to enable PostGIS in the process (as I understand it's not present currently).

In conversation with the DWG, an idea came up of using existing boundary relations. That would be really handy but I suspect we don't have a good way to determine if a point is within a boundary (perhaps another reason to add PostGIS after all?).

So with the above, a circle might be the simplest solution that gets us going with something useful. In some cases moderators would need to create multiple circles, but the UI could be made to define several circles as part of a single "block" which might hit a right balance of complexity vs utility.

As for those corner cases of London-Sydney-Bethlehem, etc, I think it's fine for the use case as this is not a proper user-facing feature. Having said that, perhaps this can be exploited... someone could create a changeset with vandalism in a defined area, plus an unrelated change 3000km away, thus triggering this edge case. Perhaps the focus should be in individual nodes rather than the changeset bounding box? But then we risk slowing down large edits... But! The important part here is to reduce the workload of moderators, not to come up with a perfect solution. Heuristics can be devised and impacts measured as we go on.

pablobm avatar Nov 21 '25 11:11 pablobm

As for other questions, from the DWG I gather:

  • All moderators will have access to this feature.
  • A reasonable definition for "new" user could be "fewer than 7 mapping days".
  • Comments should not be blocked by default. Instead have them as an option on the block.

pablobm avatar Nov 21 '25 11:11 pablobm

In conversation with the DWG, an idea came up of using existing boundary relations. That would be really handy but I suspect we don't have a good way to determine if a point is within a boundary (perhaps another reason to add PostGIS after all?).

Indeed. Determining the area covered by a boundary relation is much harder (e.g. super relations, multipolygon areas etc) and would just involve converting the chosen relation to a polygon anyway. However, the UI for picking boundaries would be easier for moderators than drawing polygons by hand. But even after a boundary relation is chosen, we'd probably want to store it in the polygon form, since the edit war might be involve deleting the relation or changing its boundaries.

In some cases moderators would need to create multiple circles, but the UI could be made to define several circles as part of a single "block" which might hit a right balance of complexity vs utility.

If the UI involves creating multiple circles, we might still want to store the intersection of those circles? Although...

a circle might be the simplest solution that gets us going with something useful.

From the UI, perhaps, and it's marginally simpler from a computational science point of view since it's the equivalent of ST_Distance / ST_DWithin. We already have this (sort of) working for nearby users. But since we'd probably want to use the postgis functions rather than doing our own, then ST_Intersects would available too and we're back to storing polygons.

plus an unrelated change 3000km away, thus triggering this edge case.

I meant more that two innocent changes might get blocked, because the innocent changeset overlaps with the blocked region. I don't think this would work in reverse, i.e. if a malicious changeset includes innocent changes elsewhere, the bounding box will still overlap with the blocked region.

  • A reasonable definition for "new" user could be "fewer than 7 mapping days".

We should align this with "days_for_max_changes" which is implemented for rate limiting new users. It's a similar concept.

  • Comments should not be blocked by default.

We have multiple types of comments - NoteComment, DiaryComment, ChangesetComment etc. It's worth being clear which one(s) is/are being referred to here - all three parent models (Note, DiaryEntry and Changeset) have coordinate information and so could be in scope for block regions.

gravitystorm avatar Nov 21 '25 15:11 gravitystorm

Is it realistic to enable PostGIS in the DB? Asking from complete ignorance: no idea of what the implications are on the operations side, etc. Perhaps @tomhughes or @firefishy have opinions on this?

If that's feasible in the short term, basing the block on an existing boundary would sound very promising. To avoid the issue of vandalised boundaries, a specific, historic version of the boundary could be chosen.

If that wasn't feasible in the short term, would it be practical so start with the same solution used to identify nearby users? From my ignorance on the topic of GIS, I'm guessing that's probably not very scalable, but might just do it initially, as better solutions are implemented iteratively and with practical knowledge of the implications.

we might still want to store the intersection of those circles?

I don't understand: what's special about the intersection of the circles? My thinking is: feature is created, we iterate through all the circles to find if something falls within. Any intersections will be checked multiple times, but that shouldn't (?) be a big deal.

Regarding the UI, is there any prior art on an interface to define polygons, that can be used in the admin section? Not sure we want to instantiate a full-blown iD here... or do we?

I meant more that two innocent changes might get blocked

This is going to be part of the tradeoffs that the DWG will have to consider when blocking regions. In any case, from my ignorance I want to think that it won't be a big deal: heuristics can improve, it only affects new/anonymous accounts, the DWG will keep an eye, it's not a "proper" user-facing feature.

pablobm avatar Nov 25 '25 11:11 pablobm

There's no real problem doing so that I'm aware of, it's just not something we've ever done because it hasn't been needed and the general position has been that the database deals in topology rather than geometry.

Obviously this is an edge case and we're not talking about using it for the main data which in one sense is good but in another is bad because it means we're adding postgis just for one weird edge case feature that I'm not even convinced is a very good idea.

tomhughes avatar Nov 25 '25 11:11 tomhughes

Are there any past instances of features that haven't been implemented, or have been implemented with workarounds, because PostGIS wasn't available?

pablobm avatar Nov 25 '25 11:11 pablobm

Not that I can think of right now.

tomhughes avatar Nov 25 '25 11:11 tomhughes

There is sort of the issue that @gravitystorm alluded to earlier, of determining the areas affected by a sparse changeset in history (#837 etc.). The coarseness of large changeset bboxes is a major annoyance for mappers and reviewers. I don’t know if a serious solution has ever been proposed directly in this repository, as opposed to in an external tool, but it would likely be a good use case for PostGIS rather than trying to roll our own spatial functionality.

1ec5 avatar Nov 25 '25 17:11 1ec5

Yes but that would be a whole other level as it would requiring turning all the primary data tables into geometries, which would be a massive undertaking that would probably involve multiple days of downtime and I'm not sure at the end of it how you would handle topology once you did that.

Unless you're suggesting we keep geometries alongside the topological data ,in which case the problem becomes the large increase in size of the database.

The real solution is likely to be a secondary service like spyglass that maintains a geometrical view of the data.

tomhughes avatar Nov 25 '25 18:11 tomhughes

Are there any past instances of features that haven't been implemented, or have been implemented with workarounds, because PostGIS wasn't available?

I'd say the obvious use case is the /map endpoint. Sometimes way nodes a quite far apart. /map fails to return ways which are just crossing a bounding box, but don't have a single node inside. With PostGIS in place this wouldn't be much of an issue anymore. Today, some clients run multiple requests or try with bigger bounding boxes to mitigate the issue.

mmd-osm avatar Nov 26 '25 22:11 mmd-osm

Yes but that would only be possible if you switched to representing ways and relations as geometries, and then you'd lose the all important topological connections.

tomhughes avatar Nov 26 '25 22:11 tomhughes

I had a call with @gravitystorm today to discuss this. I'm going to try summarise where we got, hopefully he'll correct my misunderstandings. I'm going to refer to this feature as "Block Zones", to avoid mixing up with other similarly-named features.

Enabling PostGIS:

  • Using PostGIS generally in OSM would be a huge undertaking, just as @tomhughes describes.
  • However it doesn't have to be all or nothing: we can use it in specific cases, and it'll enable new ideas over time.

Circles vs polygons (or other shapes):

  • As discussed above, circles have downsides and don't buy us that much in return.
  • Existing libraries can help us build a UI to define polygons. An example is https://terradraw.io, which is compatible with Leaflet and MapLibre.
  • With PostGIS enabled, calculating these "block zones" would be simple and performant. Doing the same thing with Ruby is complex and inefficient.

How should Block Zones be compared to incoming edits?

  • Start by comparing the bounding box of the changeset.
  • This is imperfect as these bounding boxes can expand outside the Block Zone and evade it.
  • However the alternative can be expensive. Relations/ways need to be broken down into its constituent nodes, potentially by several levels.
  • For simplicity, we are starting here, then refining as we go. We can come up with heuristics as we learn of real-wold issues.

Things to look out for:

  • Zone blocks in long-term controversial areas create a barrier to new, legitimate users. The DWG needs to be aware of this and we can come up with ideas over time, again as we evaluate real-world effects.

Action points for me:

  • Create a new issue to cover enabling PostGIS.
  • Rework description to: depend on PostGIS being enabled first, define block zones with polygons.
  • Update DWG. Remind them of potential real-world issues.

pablobm avatar Nov 27 '25 11:11 pablobm

Are there any past instances of features that haven't been implemented, or have been implemented with workarounds, because PostGIS wasn't available?

For historical reasons (e.g. timing of when we moved to Postgres, widespread availability of PostGIS etc) the "nearby users" feature has been implemented using our own code. See for example User#nearby, sql_for_distance, sql_for_area and the QuadTile library. It's a prime candidate for refactoring to use PostGIS nowadays, but we should discuss that elsewhere. As for features that haven't yet been implemented, I have lots of ideas (e.g. "notes near me") that again would be candidates for implementing with PostGIS, but again we should discuss elsewhere.

Returning to the matter at hand, I believe it can be implemented using PostGIS, without adding any additional columns to the nodes/ways/relations or changesets tables. I would attempt an implementation as follows:

  • create a "zones" table (name tbd, I'm just using a distinctive one here to help illustration) with usual attributes, created_by etc and a PostGIS geometry column
  • Have a form for creating zones, with something like https://terradraw.io/ to create the geometry attribute
  • Hook into the Changeset#update_bbox method, and throw an error if the changeset overlaps any zone
  • This can be detected by creating a postgis bbox for the changeset at query-time, and relying on the PostGIS indexes and intersection tests to efficiently check for overlapping zones, e.g. (pseudo-query) select * from zones where ST_Intersects(ST_MakeEnvelope(#{changeset.min_lon}, #{...}, ...), zones.geometry) or similar

As for whether we should use PostGIS or continue writing/using our own spatial code, I think PostGIS is my preferred choice for this. It moves all the heavy lifting (e.g. 2D-indexes, intersection testing) to a well-maintained library, and if we only need ~1 line of SQL to detect the overlaps, it's much easier to implement both here and in cgimap with much duplication.

There's a few details to be worked out (e.g. exact type of the geometry column, should zones auto-expire like UserBlocks etc) but the spatial stuff is important to nail down first.

gravitystorm avatar Nov 27 '25 11:11 gravitystorm

Specific issue about enabling PostGIS: https://github.com/openstreetmap/operations/issues/1317

pablobm avatar Nov 27 '25 12:11 pablobm

The matter at hand relates to notes, so doesn't involve changesets at all, and nobody has claimed that it couldn't be implemented without adding to existing tables.

Somehow things seem to have got distracted into talking about changesets but that's not actually what this issue purports to be about. It would certainly be possible to do that based on bounding boxes as you say but that does create problems when it comes to large bounding boxes that might intersect many "zones" that they don't really touch.

Two more things I would suggest that we need to look out for:

  • Blocking things in an area may just push that activity to immediately surrounding areas, leading the the area being enlarged and so on ad infinitum.
  • It will likely lead to an increased support load both from innocent users who are confused by being unable to do what they want and from bad actors wanting to complain about being targeted.

Anyway apparently you two have this in hand so I'll let you get on with it.

tomhughes avatar Nov 27 '25 12:11 tomhughes

The matter at hand relates to notes, so doesn't involve changesets at all,

The description of the issue is "Designate regions where new accounts cannot edit or add notes" (my emphasis) and the description contains "Implement the ability to designate regions where new accounts and anonymous users cannot make map edits or create notes." so that's why I'm considering changesets 😄

Anyway apparently you two have this in hand so I'll let you get on with it.

No, I'd rather hear more from other people, including you. My call with @pablobm was to help go over some of the background topics like topology vs geometries, postgis, and why things were implemented as they were "back in the day", etc - it wasn't any kind of offline decision making.

gravitystorm avatar Nov 27 '25 12:11 gravitystorm

I interpreted that as "(edit or add) notes" not "edit or (add notes)" which I guess explains the confusion ;-)

tomhughes avatar Nov 27 '25 12:11 tomhughes

Somehow I also had it in my head that this was only about anonymous notes but I now realise (as that doesn't make sense for map edits) that it also mentions new accounts which I find much more concerning - that will be an enormous additional support load and probably lead to extremely negative results.

tomhughes avatar Nov 27 '25 12:11 tomhughes

based on bounding boxes as you say but that does create problems when it comes to large bounding boxes that might intersect many "zones" that they don't really touch

Since we're talking about new accounts, large bounding boxes are anyway rejected by the api_size_limit db function. For prototyping purposes only, you could probably abuse that function to include a zone intersection check, and return 0 as permitted size...

mmd-osm avatar Nov 27 '25 12:11 mmd-osm

Team, apologies for my phrasing earlier. I made it sound like "decisions" were made, while it was more of my sloppy brain dump. I do want to hear more from all, and I don't consider this ready to implement.

pablobm avatar Nov 27 '25 12:11 pablobm

Rephrased the title of this issue to avoid ambiguity

pablobm avatar Nov 27 '25 14:11 pablobm

As a general comment, there was some mention of "anonymous accounts", a concept that doesn't exist. An anonymous note is not linked to any account (rather than being linked to an "anonymous account"). Probably just an oversight but worth keeping the nomenclature straight!

It's been a long time since I did any work on the website code but IIRC there is some logic that, before even applying a change to the database, will extend the current changeset's bounding box. If that is still the case then that would probably be the easiest place to introduce a limit like this - just refuse to extend the bbox if it grows into a blocked area, and throw an exception or whatever one does in Rails. -- Another architecture option is something that @gravitystorm suggested about a decade ago, that you create a mechanism where separate programs can be looped into the website code and basically ask for all (or specific?) changesets to be passed by them for evaluation before they are accepted into the database, but that's probably too grand a scheme.

And yes this could trip up users making world-spanning changesets but that isn't necessarily an unwanted side effect. We could create a small blocked area around Null Island and keep it there on purpose ;)

As to the increased support load, I think we should have a "help" team to which support (and potentially also DWG or LWG) could deflect incoming ticktets that do not require administrative action but only explanation and hand-holding. But that is probably out of scope for this ticket ;)

woodpeck avatar Nov 27 '25 15:11 woodpeck

IIRC there is some logic that, before even applying a change to the database, will extend the current changeset's bounding box. If that is still the case then that would probably be the easiest place to introduce a limit like this

No, I removed that code. It was some sort of premature performance optimization that was no longer needed.

separate programs looped into the website code and basically ask for all (or specific?) changesets to be passed by them for evaluation before they are accepted into the database, but that's probably too grand a scheme.

That's all being done by cgimap today, and it doesn't sound like a good idea to loop in external tools.

Today's processing is quite different from the pre-2018 world, after all.

mmd-osm avatar Nov 27 '25 15:11 mmd-osm

Would it make sense to split the discussion about notes from the one about changesets at this point? In some sense they’re related and could benefit from a consistent approach, but with notes we’re essentially dealing with a point geometry, which would be much less complex than an arbitrarily sparse feature collection geometry (to use Simple Features terminology).

1ec5 avatar Nov 27 '25 15:11 1ec5

Yes, I think it should be split. I'll do it (currently looking at what's the "correct" way to split an issue in GitHub...).

pablobm avatar Nov 27 '25 15:11 pablobm

Would it make sense to split the discussion about notes from the one about changesets at this point?

That would have been my proposal as well. Dealing with changesets is much more involved and needs lots more thought. Notes are (in comparison) rather trivial. Plus, you can hopefully reuse everything UI and zone management related later on.

mmd-osm avatar Nov 27 '25 15:11 mmd-osm

IIRC there is some logic that, before even applying a change to the database, will extend the current changeset's bounding box. If that is still the case then that would probably be the easiest place to introduce a limit like this

No, I removed that code. It was some sort of premature performance optimization that was no longer needed.

I'm surprised by this point. Is there no equivalent of Changeset#update_bbox in cgimap? There is presumably some place where diffs (or individual element changes) are rejected for being over the limit, before anything is committed to the db.

gravitystorm avatar Nov 27 '25 15:11 gravitystorm