EVMap icon indicating copy to clipboard operation
EVMap copied to clipboard

OSM Data Source: Support for 2023 tagging schema

Open dbrgn opened this issue 3 months ago • 12 comments

Hello Johan, how are you currently generating the OSM data sources at osm.ev-map.app? Is it still my PoC script at https://github.com/ev-map/evmap-osm that runs nightly?

I'm asking because in 2023 a proposal for updating the OSM tagging schema was voted on and approved: https://wiki.openstreetmap.org/wiki/Proposal:EV_Charging_Station_Mapping This changes the tagging logic a bit, for example now charging stations should ideally be mapped as area, not as node.

The change brings some advantages for EVMap, for example now it has a tagging schema that includes chargepoints including EVSE IDs, which may be helpful for linking against chargeprice. However, the logic from the PoC is now too naive. As example, this charging station is micro-mapped as an area, but does not show up in your data JSON.

If the linked repo is still the latest version, I can try to come up with a PR that improves the querying logic.

dbrgn avatar Sep 21 '25 20:09 dbrgn

Oh yeah, that new tagging schema sounds like a very useful improvement!

Is it still my PoC script at https://github.com/ev-map/evmap-osm that runs nightly?

Yes, it is. I have started prototyping a more sophisticated backend in an effort to combine data from different sources, but that will take more time to reach a usable state.

If the linked repo is still the latest version, I can try to come up with a PR that improves the querying logic.

That would be great! Of course it would be good if it were backwards-compatible with the current app version. But if needed we can also create a new endpoint with a new JSON format.

johan12345 avatar Sep 21 '25 20:09 johan12345

@johan12345 the initial fix was very easy, see https://github.com/ev-map/evmap-osm/pull/1.

This does not yet include any details on chargepoints, etc. I looked at the options for querying, and it seems like a tricky thing to do in a single overpass query (because you have a is-within relation for every chargepoint for every charging station, which is expensive to compute).

How far are you with your backend attempts? If I would just "do my thing", I would start to prototype a Rust binary that downloads both charging station and chargepoint data from overpass, and that matches chargepoints (potentially in parallel) to charging stations (grouped as JSON array of JSON objects below the charging station entry). This should even be backwards compatible, and should be optimizable for performance in the future (e.g. incremental fetching, instead of one big global query).

Some more thoughts:

  • The tagging change gave the brand tag more importance, it should probably be the preferred tag for you (in case you process it) compared to operator or network.
  • Since the evmap-osm repo is the source for the processing logic, should we discuss these things there instead of here in the app repo?

dbrgn avatar Sep 21 '25 20:09 dbrgn

the initial fix was very easy, see https://github.com/ev-map/evmap-osm/pull/1.

Thanks! Already deployed and running the update.

How far are you with your backend attempts?

I started with a Python+GeoDjango app (which can then use PostGIS as a DB, for now I'm testing with Spatialite) and implemented scripts to import data from different sources. OSM uses the same Overpass approach as in evmap-osm, and there is an endpoint that serves the chargers from OSM in the same JSON format as your bash script produces.

In terms of other data sources, I have added basic implementations for GoingElectric and Nobil, and started looking into the offerings at Mobilithek, where all the CPOs in Germany now have to provide open data following the EU AFIR regulation. Matching and aggregating data between different sources and serving them all in a common format is not really in a functional state yet, and that will probably take a lot more time to get right. But we could use the OSM component independently for now.

I'm not sure yet whether Django is the best choice, I just wanted to start prototyping with something I know. Rust is very nice too, I have tried it a bit at work last year. Though if we have a spatial database already anyway, a lot of the heavy lifting like matching chargepoints to charging stations can probably be offloaded to that, so the performance on the Python side might not be so much of an issue.

The tagging change gave the brand tag more importance, it should probably be the preferred tag for you (in case you process it) compared to operator or network.

Right. brand would be the equivalent of what EVMap calls network, right?

Since the evmap-osm repo is the source for the processing logic, should we discuss these things there instead of here in the app repo?

We can also keep it here for now - if we decide to implement it in the new backend, that would be yet another repo (which I haven't made public yet - I also still have to decide on the license for that...)

johan12345 avatar Sep 21 '25 21:09 johan12345

Right. brand would be the equivalent of what EVMap calls network, right?

I would assume so, but it's a bit tricky. The OSM wiki writes: The verifiable brand displayed on the station or charge points. If no brand is present, or the brand and operator are different, operator may be used using the trade name of the business. Sometimes operator and brand are the same, sometimes they might differ. But I think in most cases brand will be what you want (multiple stations of the same brand/network may have different CPOs).

I'm not sure yet whether Django is the best choice, I just wanted to start prototyping with something I know. Rust is very nice too, I have tried it a bit at work last year. Though if we have a spatial database already anyway, a lot of the heavy lifting like matching chargepoints to charging stations can probably be offloaded to that, so the performance on the Python side might not be so much of an issue.

It probably makes sense to use this application for merging different data sources into a single format and cleaning it up.

For initial loading and pre-processing of data, a Rust application probably makes more sense, because it has higher performance out of the box and is easier to optimize (e.g. for parallelism). It can output the data file that your application uses as input.

Pipeline would be:

OSM > Overpass API > Intermediate format (merged stations and chargepoints) > EVMap Loader > Final API

I started creating a prototype, sounds like a fun project. I first attempted to do a fully streaming approach, where the HTTP respones is immediately parsed and processed in a streaming fashion, but the code became too complex too quickly, so for now I'll stick to a multi-step approach, either in-memory or with file based caching if needed. (Thanks to Claude, the conversion was quick and already works, so I can focus on the optimizations and refinements, and not on the implementation of the CLI / fetching / parsing.)

dbrgn avatar Sep 21 '25 21:09 dbrgn

Updated data preprocessor is WIP here: https://github.com/dbrgn/evmap-osm/pull/2

dbrgn avatar Sep 23 '25 21:09 dbrgn

Pipeline would be: OSM > Overpass API > Intermediate format (merged stations and chargepoints) > EVMap Loader > Final API

Yeah, that probably makes sense. And as long as the new EVMap API is not finished yet, we can still make the intermediate format available like it is now.

Updated data preprocessor is WIP here: https://github.com/dbrgn/evmap-osm/pull/2

Thank you, looks good so far! The format that it currently produces is still missing the lat & lon for ways & relations. But adding those would probably be part of this process of merging stations and chargepoints, which is still missing, right?

johan12345 avatar Sep 28 '25 15:09 johan12345

@johan12345 yes, that's the missing part. When working on it more, I realized that doing this in-memory will not be efficient for a large number of elements without an index.

I decided to instead go the route of loading all OSM data, filtering it with osmium, and loading it into an SpatiaLite database. From there it can be queried.

Using that pipeline, it's a bit tricky to collect all keys/values into the JSON. It would be easier and potentially more useful if we would already do some preprocessing logic for EVMap.

Do you have a link to the charging station / charge point data structure used in EVMap? What data do you have in there, in what format?

dbrgn avatar Oct 01 '25 22:10 dbrgn

@dbrgn With "loading all OSM data", do you mean all the chargepoint data from the Overpass API, or the whole Planet.osm file? The latter would probably be quite resource-intensive to download and process daily, right?

This is the current data structure used in EVMap: https://github.com/ev-map/EVMap/blob/master/app/src/main/java/net/vonforst/evmap/model/ChargersModel.kt And this is how the OSM types currently get converted into it: https://github.com/ev-map/EVMap/blob/master/app/src/main/java/net/vonforst/evmap/api/openstreetmap/OpenStreetMapModel.kt

The model is based on what GoingElectric uses, the structure is ChargeLocation -> Chargepoint (where a Chargepoint can have any number of connectors of the same type and power). This is enough for what EVMap shows at the moment, but maybe not quite the best approach now that some data sources are providing EVSEIDs.

With the planned new backend, I will probably adapt this a bit into a three-level structure ChargeLocation -> Chargepoint (single EVSEID) -> Connector (single connector), or similar. This is also the structure in the DATEX2 format that the EU AFIR open data regulations are proposing. In most cases, one Chargepoint will have exactly one Connector, but there are cases where one EVSEID is used for multiple connectors. For example, some AC chargers use the same EVSEID for one Type 2 and one Schuko connector, as you can not use both at the same time. Plug type and power/voltage/amperate information will still be on the Connector level, but real-time availability and pricing information will usually be provided at the Chargepoint (EVSEID) level, not on the Connector level.

johan12345 avatar Oct 02 '25 17:10 johan12345

@dbrgn With "loading all OSM data", do you mean all the chargepoint data from the Overpass API, or the whole Planet.osm file? The latter would probably be quite resource-intensive to download and process daily, right?

The latter. Main concern is that overpass does not natively support treating charging stations as areas, so querying is expensive. This is the result of a brainstorming session with ChatGPT:

Image

...so Overpass is unlikely to scale well. And even if Overpass would whitelist charging stations and treat them as areas (similar to city boundaries, for example), it would still not scale well on public Overpass instances.

The full planet filefrom planet.openstreetmap.org is currently 83G in size. Possibly it wouldn't need to be updated daily, I think weekly would suffice. Would that be a concern?

Alternatively, we could use overpass and query country-by-country, or something like that. Doesn't sound like a great solution though, compared to just using the raw data directly (which also opens up the possibility of doing other, more interesting queries/operations on it).

dbrgn avatar Oct 03 '25 09:10 dbrgn

Ah, I just checked https://wiki.openstreetmap.org/wiki/Planet.osm, and found some interesting information:

  • Planet files are generated weekly, so more frequent updates won't help anyways
  • There is a torrent available that may be faster if your downlink is faster than what the AWS provides in practice (I have a 25G downlink at home, but the download is still running with only 30-50 MB/s)
  • There are daily diffs available, that could be used to bridge the gap between the planet download and daily updates, if desired

dbrgn avatar Oct 03 '25 10:10 dbrgn

The full planet filefrom planet.openstreetmap.org is currently 83G in size. Possibly it wouldn't need to be updated daily, I think weekly would suffice. Would that be a concern?

Hmm, at least it doesn't have to be decompressed first before processing with osmium, right? Still, it might be about the point where I would have to switch the approach from "a simple <10€/month VPS can easily host everything" to "I might have to spin up a separate cloud instance every week just for the OSM import" (or run it at home or something).

I understand Overpass can't easily do the query to find out which nodes are within which area - but couldn't it be used to just export all the charging station data (nodes and areas) from OSM, which we can then import into a spatial DB to do the query locally?

johan12345 avatar Oct 03 '25 10:10 johan12345

I understand Overpass can't easily do the query to find out which nodes are within which area - but couldn't it be used to just export all the charging station data (nodes and areas) from OSM, which we can then import into a spatial DB to do the query locally?

Ha, that's a smart idea! I think that's possible, I'll evaluate it when I find time to work on this again.

Still, it might be about the point where I would have to switch the approach from "a simple <10€/month VPS can easily host everything" to "I might have to spin up a separate cloud instance every week just for the OSM import" (or run it at home or something).

If cost and resources is a bigger issue than full control over the data processing pipeline, I could offer to run the processing on a server of mine and provide an endpoint where your server could download the resulting file. It would still be hosted from your server (I assume you want that), but I can do the preprocessing in a specific format. (Other projects might benefit from that as well.)

dbrgn avatar Oct 03 '25 11:10 dbrgn