operations icon indicating copy to clipboard operation
operations copied to clipboard

Reconsider not blocking mapproxy

Open pnorman opened this issue 2 years ago • 1 comments

It's possible to configure mapproxy in a way that causes it to scrape tile.osm.org, when someone tries to seed a cache. Previously, we decided not to block mapproxy user-agents, and handle it on a per-IP basis. I'm not sure this is sustainable anymore.

Yesterday, there were 71 TPS from MapProxy with 48 TPS miss. When I looked at it by IP, four of the top five IPs were clearly scraping, with one at a high level. I manually added 5 IPs to the Fastly block list, which were a mix of RU, PL, and US ISPs and some AWS EC2 IPs.

Having to do this manually is a pain, since I have to spot the scraping in the daily tile logs, run athena queries, then block the IPs. Normally if there's abuse showing up in the tile logs, I can immediately deal with it and it doesn't recur. With Mapproxy, it's whack-a-mole.

pnorman avatar Aug 22 '23 06:08 pnorman

Previously, we decided not to block mapproxy user-agents, and handle it on a per-IP basis. I'm not sure this is sustainable anymore.

Would it really help though? Couldn't such abusers trivially just change their user-agent to get around the block (which I guess they would do as soon as they saw that mapproxy UA was blocked)?

It seems to me that blocking all "mapproxy" user-agents (if I'm correctly understanding what is being proposed here) would thus only harm legitimate users, without significantly affecting abusers, even in the short run?

Wouldn't it be better to have some script detect high-TPS-per-IP and block those automatically, instead of doing that manually? (or even restrict that high-TPS-auto-blocking just to user-agents containing "mapproxy" if that is what you'd prefer instead)

There are many valid use cases for using mapproxy (e.g. using it as a local cache) which actually reduce load on OSM tile servers; it would suck to have mapproxy banned arbitrarily even if one is using it normally (and way below ToS). (Yes, legitimate users could work around that at shown above, but it creates frustration and wastes their time needlessly, and to legitimate users it would feel dirty having to lie to avoid such misplaced blocks. Intentional abusers likely won't have such moral qualms)

mnalis avatar Apr 26 '24 00:04 mnalis